-AIM  773  SPECIFICATION  AND  DESIGN  METHODOLOGIES  FOA  HIGH-SPEED 
FAULT-TOLERANT  ARRA.  .  <U>  CALIFORNIA  UNIV  LOS  ANGELES 
DEPT  OF  COHPUTER  SCIENCE  H  D  ERCEGOVAC  ET  AL.  JUN  87 
UNCLASSIFIED  NMA14-M-K-A493  F/G  9/1 


AD- A 183  772 


DTK  FILE  CORY  -  ■  @> 

FINAL  REPORT 


SPECIFICATION  AND  DESIGN  METHODOLOGIES  FOR 
HIGH-SPEED  FAULT-TOLERANT  ARRAY  ALGORITHMS  AND  STRUCTURES 

FOR  VLSI 


Office  of  Naval  Research 
Contract  No.  N00014-83-K-0493 


Principal  Investigator 
MiloS  D.  Ercegovac 


Co-Principal  Ivestigator 
Algirdas  Avizienis 


DT7C 

electe 

AUG  0  4  1987 


Faculty  Associate: 
Tomas  Lang 


UCLA  Computer  Science  Department 
University  of  California,  Los  Angeles 
Los  Angeles,  California  90024 
(213)  825-2660 


June  1987 


1 


g 7  7  06  3 


Table  of  Contents 


1.  Summary  of  the  Project  Objectives  3 

2.  Summary  of  Contributions:  Task  1  4 

2.1  Introduction  4 

2.2  Specification  of  Hardware  Functions 

and  Algorithms  in  vFP  4 

2.3  Obtaining  Layouts  from  FP  6 

2.4  Evaluation  of  Designs  7 

2.5  Interfacing  vFP  Design  System  with 

Existing  VLSI  CAD  Tools  7 

2.6  Compiler  Research  and  Developments  7 

2.7  Algorithms  for  VLSI  Implementation  8 

2.8  Future  Research  8 

3.  Summary  of  Contributions:  Task  2  10 

4.  Publications  Resulting  from  This  Project  12 


Appendices:  Selected  Publications 

1.  D.R.  Patel,  M.  Schlag,  and  M.D.  Ercegovac, 

"vFP:  An  Environment  for  the  Multi-Level 
Specification,  Analysis,  and  Synthesis 

of  Hardware  Algorithms" 

2.  F.  Meshkinpour  and  M.D.  Ercegovac, 

"A  Functional  Language  for  Description  and 
Design  of  Digital  Systems:Sequential  Constructs" 

3.  A.  Aviiienis,  "Arithmetic  Algorithms  for 
Operands  Encoded  in  Two-Dimensional  Low-Cost 
Arithmetic  Error  Codes" 

4.  M.D.F.  Schlag,  "Layout  from  a  Topological  Description" 

(Abstract) 


5.  J.  Moreno,  "A  Proposal  for  the  Systematic  Design 
of  Arrays  for  Matrix  Computations" 


(Abstract) 


*Reports  will  be  submitted  upon  request 


•  I  i, 


,  |l«‘ it.  it. VJ t .'ll.'it  l.l  < 


i  t.t  l.l  I.H4  'J  »  4’l  *  I  < 


.•l  . * I  .14  ,»*  «**  , 


1.  SUMMARY  OF  THE  PROJECT  OBJECTIVES 

For  convenience  we  summarize  here  the  project  objectives  as  stated  in  the  research 
proposal.  This  research  in  the  methodologies  for  the  specification  and  design  of  high-speed, 
fault-tolerant  VLSI  array  structures  has  two  related  objectives  (1)  a  high-level  language 
approach  to  the  specification  and  simulation  of  VLSI  algorithms  and  networks  using  a 
functional-style  (LISP-like)  language  (Task  1),  and  (2)  cost-effective  methods  to  introduce 
fault-tolerance  (error  detection,  fault  location,  retry,  and  reconfiguration)  into  VLSI- 
implemented  systolic  systems  and  similar  computing  arrays  (Task  2). 

Task  1:  Functional  Language  Approach  to  VLSI  CAD+ 

Principal  Investigator:  Milos  D.  Ercegovac 

The  major  goals  are  the  development  and  implementation  of  a  functional-style  language 
for  specification  of  VLSI  structures  which  allows  multilevel  simulation,  performance  analysis, 
algebraic  transformation  techniques,  and  layout  planning,  The  proposed  high-level  functional- 
style  language  approach  provides  a  clean  separation  of  functional  and  structural  specifications, 
and  supports  strongly  the  multi-level,  hierarchical  design;  the  language  is  executable  at  any  level 
of  abstraction  to  allow  for  early  evaluation  and  checking  of  designs;  it  has  an  efficient  and 
comprehensive  built-in  performance  evaluation  mechanism  which  allows  a  selective 
performance  observation;  and  it  supports  a  semi-automatic  design  methodology  under 
implementation  constraints  and  system  requirements. 


Task  2:  Fault-Tolerance  in  VLSI  Systolic  Arrays, 

Principal  Investigator:  Algirdas  Aviiienis 

A  systolic  VLSI  system  consists  of  a  set  of  interconnected  cells,  and  information 
between  the  cells  flows  in  a  pipelined  fashion.  To  provide  fault-tolerance  for  a  systolic  system, 
the  most  fundamental  requirement  is  to  provide  an  effective  and  low-cost  method  for  the 
immediate  detection  of  errors  that  occur  in  the  numerical  information  that  is  generated  by  the 
cells  and  forwarded  to  other  cells  or  to  I/O  ports.  The  occurrence  of  transient  malfunctions  and 
the  complexity  of  structure  of  a  VLSI  systolic  system  rules  out  periodically  applied  diagnostic 
tests  as  an  effective  fault  detection  method.  The  remaining  approach  is  to  provide  concurrent 
error  detection  that  takes  place  side-by-side  with  regular  computation  whenever  the  systolic 
system  is  carrying  out  its  activity. 


♦  This  research  was  also  supproted  in  part  by  the  State  of  California  MICRO  Program 
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2.  SUMMARY  OF  CONTRIBUTIONS:  Task  1 


This  task  deals  with  a  study  and  development  of  a  high-level  language  approach  in 
specification,  simulation,  performance  evaluation  and  chip  layout  planning  for  VLSI  digital 
systems.  A  high-level  applicative  (functional)  language,  implemented  at  UCLA,  allows 
combining  of  top-down  techniques  of  functional  and  structural  specification  of  systems  with 
bottom-up  specification  of  implementation  constraints  such  as  the  size  of  circuit  layout  and 
wiring  patterns.  It  also  provides  highly  modular  algebraic  specification  of  digital  systems 
suitable  for  formal  transformations,  simulation,  and  a  powerful  method  of  ,osx)logical 
interpretation  which  generates  diagrams  at  any  level  of  abstraction.  Several  versions  of  the 
language,  the  simulation  and  performance  evaluation  tools  and  a  graphics  interface  have  been 
developed  and  implemented  on  the  DEC  VAX  1 1/750  under  the  UNIX  operating  system. 


2.1  Introduction 

The  complexity  of  VLSI  requires  the  application  of  CAD  tools  at  all  levels  of  the  design 
process.  In  order  to  be  effective,  these  tools  must  be  adaptive  to  the  specific  design.  In  this 
project  we  studied  a  design  method  based  on  the  use  of  applicative  languages  [Bac78]  for  the 
specification,  evaluation  and  synthesis  of  hardware  algorithms.  A  functional  language  for 
specification  of  hardware  systems  is  attractive  because  it  provides  both  a  behavioral  and 
structural  information  about  a  circuit  implementing  the  system  [Lah81,  Joh84,  Mes84,  Pat85, 
She84,  Sch86,  Wor86].  As  a  consequence,  a  behavioral  specification  implies  a  topology  of  the 
circuit  which  allows  generation  of  "abstract"  layouts.  These  layouts  are  refined  by  introducing 
geometrical  constraints  to  produce  physical  layouts. 

Our  methodology  is  supported  by  a  set  of  tools  developed  at  UCLA.  The  goal  of  the 
system  is  to  provide  designers  with  an  environment  in  which  they  can  rapidly  explore  various 
alternative  designs.  Thus  it  is  possible  to  specify  the  algorithm  at  any  level  of  abstraction  and 
have  the  system  rapidly  evaluate  certain  parameters  (e.g.,  delays)  and  provide  feedback  in  the 
form  of  automatically  generated  floorplans.  The  advantage  of  using  an  applicative  language  is 
that  it  ties  together  the  specification  of  the  algorithm,  the  synthesis  of  the  circuit,  and  the 
evaluation  of  the  implementation.  The  algebraic  basis  of  FP  allows  formal  transformations  of 
the  specification  to  improve  the  layout  without  changing  its  function. 


2.2  Specification  of  Hardware  Functions  and  Algorithms  in  vFP 

A  program  in  a  functional  language  is  a  function  that  maps  objects  into  objects.  Objects 
are  either  atoms  (numbers  or  strings)  or  sequences  of  objects.  There  is  a  special  atom  ?  denoting 
an  undefined  value.  Any  sequence  that  contains  ?  is  undefined.  The  language  includes  primitive 
functions,  functional  (combining)  forms,  and  means  of  defining  functions.  A  computation  is 
invoked  by  applying  the  function  to  an  object.  There  are  no  variables  and  all  FP  programs  are 
generic,  i.e.,  independent  of  the  size  of  their  arguments. 


The  FP  language  we  use  is  based  on  Backus’  FP  [Bac78]  with  the  following  additions: 
parameters  to  function  definitions  are  allowed;  both  the  infix  and  prefix  modes  for  the 
arithmetic,  logical,  and  predicate  functions  can  be  used;  there  are  additional  primitives  and 
functional  forms;  and  extensions  for  the  specification  of  sequential  systems  are  introduced 
[Mes84,  Mes85,  Pat85,  Sch86,  Wor86].  The  primitive  functions  map  objects  to  objects.  They 
include 

arithmetic  +:<1  5>  -» 6  *:<2  5>  — >  10 

logical  andg:<l  0>  ->  0  org:<0  0>  ->  0 

predicate  atom:<a  b  c>  -» F  =:<12  12>  ->  T 

selector  2:<1  a  3  5  b>  ->  a  last:<4  3  2  1>  1 

and  structure  modifying  functions  such  as 

transpose  trans:  «1  2  3x4  5  6»  — »  «1  4x2  5x3  6» 

append  left  apndl:  <a  <b  c  d»  — >  <a  b  c  d> 

disuibute  right  distr.  «1  2  3>4>  — x<l  4x2  4x3  4» 

Functional  forms  map  functions  or  objects  to  functions.  For  example, 

compose  f@g:x  — >  f:<g:x> 

construct  [f,g,h]:x  — >  <f:x  g:x  h:x> 

apply  to  all  &f:<a  b  c>  ->  <f:a  f:b  f:c> 
constant  %k:x  -4  k  if  x  is  not  ? 
right  insert  !f:<a  b  c  d>  ->  f:<a  !f:<b  c  d» 

A  computation  in  a  digital  system  consists  of  moving  and  transforming  data  according  to 
some  precedence  relation.  The  language  provides  explicit  means  for  specifying  precedences  and 
concurrency  (@  and  []  functional  forms,  for  example),  computational  functions  (e.g.,  logical 
primitives),  and  routing  functions  (e.g.,  selectors).  Since  a  unit  of  information  represented  by  an 
atom  depends  on  the  level  of  abstraction,  hierarchical  specifications  are  natural.  Therefore,  FP  is 
suited  for  describing  hardware  functions  and  algorithms. 

For  example,  FP  specifications  for  the  following  primitive  functions  used  in  the  design 
of  a  carry-save  array  multiplier  are 

/*  FA*:«a  bxy  x>  -4«c  x>s>  where  2c+s=a+b+yx  */ 
defun  FA* 

[  [org@  [  1 , 1@  2]  ,3]  ,2@  2] 

@[1@1,HADD@[2@1,2],3] 

@[HADD@l,andg@2,2@2] 

enddef 

/*  HA*:«axy  x»  —>  «c  x>s>  where  2c+s=a+yx  */ 
defun  HA* 


[[l@l,2],2@l]@[HADD@[l@l,andg@2],2@2] 

enddef 

/*  FA**:«a  bxx  y»  -kcs>  where  2c+s=a+b+yx  */ 

defun  FA** 

[org@[l,l@2],2@2] 

@[1@1,  HADD@[2@1,2]] 

@[HADD@l,andg@2] 

enddef 

/*  FA:«a  b.c.  — ><c  s>  where  2c+s=a+b+c  */ 

defun  FA 

[org@[l,l@2],2@2] 

@[l@l,HADD(a)[2@l,2]] 

@[HADD@1,2] 

enddef 

/*  HADD:<a  b>  — >  <c  s>  where  2c+s=a+b  */ 

defun  HADD 

[andg.org] 

enddef 

By  executing  symbolically  these  specifications,  it  is  possible  to  extract  the  corresponding 
topological  structure  and  produce  the  sketches  of  functions  FA*,  HA*  and  FA**  as  shown  in 
Figure  1.  Obtaining  layouts  from  FP  expressions  is  discussed  in  Section  2.4. 

There  is  no  concept  of  state  in  an  FP  program  and,  consequently,  there  is  no  history  of 
execution.  All  information  needed  by  a  computation  must  be  specified  as  the  input  to  the 
corresponding  function.  A  sequential  system  could  be  described  by  a  function  which  passes  its 
state  as  an  argument  back  to  itself.  This,  however,  makes  symbolic  execution  of  such  a  FP 
specification  and  extraction  of  its  topological  structure  difficult  [Sch86],  Our  approach  is  to 
describe  sequential  circuits  using  the  space-time  duality.  That  is,  a  sequential  circuit  is  described 
as  the  folding  of  a  combinational  circuit  so  that  the  same  structure  performs  a  computation  in 
time  rather  than  space  [Pat85]. 


2.3  Obtaining  layouts  from  vFP 

As  mentioned  in  the  introduction,  a  key  idea  of  our  design  methodology  is  to  deduce  the 
geometry  of  the  layout  from  a  behavioral  specification  of  the  circuit  rather  than  to  specify  the 
geometry  as  a  part  of  the  behavioral  specification.  This  is  possible  since  an  FP  program  as  a 
behavioral  specification  of  a  circuit  implies  the  topology  of  its  organization,  i.e.,  relative 
positions  of  the  components  and  their  connections  in  the  plane.  Schlag  [Sch86]  has  developed  a 
methodology  for  obtaining  layouts  from  FP  expressions.  This  methodology  is  based  on  a  formal 
notion  of  the  planar  topology  of  a  circuit,  a  mapping  from  FP  expressions  to  planar  circuits,  and 


a  technique  for  transforming  the  planar  topology  of  a  circuit  described  in  FP  into  a  physical 
layout. 

The  following  example  of  a  carry-save  array  multiplier,  developed  by  Schlag  [Sch86], 
illustrates  the  layout  obtained  by  applying  our  design  methodology.  Figure  2  shows  a  sketch  of 
the  function  Mult  with  the  functions  HA*,  FA*,  and  FA**  represented  as  components.  Figure  3 
shows  the  same  function  with  those  components  expanded.  Finally,  the  completed  layout  is 
given  in  Figure  4.  Power  and  ground  wiring  has  been  added  to  the  layout  using  a  graphics 
editor.  Since  the  specification  is  generic,  multipliers  of  different  precision  can  be  obtained  by 
applying  Mult  function  to  operands  of  a  desired  precision. 


2.4  Evaluation  of  Designs 

Sausville  has  implemented  a  package  that  allows  timing  analysis  of  circuits  represented 
by  FP  programs  [Sausville  1986].  His  work  is  restricted  to  uniform  delays.  Currently  a  work  is 
under  way  to  extend  this  package  to  deal  with  arbitrary  (user-defined)  delays. 


2.5  Interfacing  FP  Design  System  and  Existing  VLSI  CAD 

A  VLSI  design  system  has  been  developed  using  UCLA  FP  as  the  specification  language 
by  J.  Worley  [Wor861  and  BDL  (Block  Design  Language)  [Slu84]  as  the  input  language  for 
VLSI  CAD  tools  available  at  Hewlett-Packard.  The  circuit  synthesis  proceeds  in  three  steps:  (1) 
the  functional  (executable)  specification  of  a  digital  system  is  developed  and  tested,  (2)  a 
specific  implementation  and  its  net  list  is  obtained  by  tracing  the  symbolic  execution  of  the 
specification,  and  (3)  and  the  trace  is  processed  by  trace  filters  to  obtain  various  design 
information.  For  example,  there  are  trace  filters  to  print  net  lists,  count  modules  and 
connections,  and  translate  into  other  design  languages.  At  present,  there  are  translators  for  esim, 
a  switch-level  simulator,  the  circuit  level  simulator  SPICE,  and  the  BDL  block  description 
language  [Slu84].  A  BDL  description  is  then  used  to  drive  an  actual  circuit  layout  generator.  In 
this  case,  the  user  control  of  the  topological  features  in  the  layout  was  traded  for  utilities 
provided  by  an  available  tool. 


2.6  Compiler  Research  and  Developments 

A  compiler  de  el  pment  [Ara86]  offers  a  performance  enhancement  for  execution 
environment  of  our  FP  language.  Several  important  techniques  to  reduce  the  run-time  load  have 
been  introduced  and  implemented.  An  efficient  and  fast  threaded  FP  interpreter/compiler  has 
been  implemented  [Pun86].  Alkalaj  has  introduced  a  very  efficient  scheme  for  garbage 
collection  for  FP  programs  executed  on  a  uniprocessor  [Alk86,  Alk87],  These  language 
processing  schemes  and  tools  are  essential  in  building  an  efficient  design  environment. 
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Figure  4.  The  layout  of  MULT 
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2.7  Algorithms  for  VLSI  Implementation 


In  the  area  of  algorithms  for  systolic  arrays  the  research  focused  on  analysis  of  design 
alternatives  and  development  of  algorithms  for  linear  algebra  processors.  A  comprehensive 
study  of  alternatives  for  a  singular  value  decomposition  processor  has  been  done  [Mor85].  This 
work  has  been  recently  extended  into  a  proposal  for  the  systematic  design  of  arrays  for  matrix 
computations  [Mor87].  An  efficient  division  algorithm,  based  on  the  work  Ercegovac  and  Lang 
[Erc85],  has  been  developed  [Tu86].  In  order  to  provide  a  flexible  and  powerfull  simulation 
environment  for  this  type  of  research,  a  two-step  simulator  has  been  developed.  In  the  first  step, 
an  FP  specification  of  the  algorithm  to  be  simulated,  is  symbolically  interpreted  to  produce  a 
corresponding  network  at  the  level  of  given  primitives.  In  the  second  step  this  network  is  used  to 
execute  the  algorithm  and  collect  statistics. 


2.8  Future  Research 

We  are  continuing  work  on  two  aspects  of  our  FP-based  VLSI  design  system.  To  utilize 
well-developed  tools  available  at  the  lower  levels  of  VLSI  circuit  design,  we  are  developing  an 
interface  between  UCLA  FP  and  such  tools.  VIVID,  an  integrated  VLSI  design  system, 
developed  at  the  MCNC,  is  selected  as  the  target.  A  translator  from  UCLA  FP  into  ABCD,  a 
specification  language  of  VIVID,  is  under  development  [Wu87],  The  second  aspect  of  our 
continuing  research  deals  with  refinment  and  formalization  of  the  proposed  treatment  of 
sequential  circuits  in  UCLA  FP  [Pat86]. 
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3.  SUMMARY  OF  CONTRIBUTIONS:  Task  2 

The  research  in  this  task  has  focused  on  the  application  of  low-cost  arithmetic  error 
codes  [AVIZ  71],[AVIZ  81],[AVIZ  83]  to  the  concurrent  detection  of  errors  (due  to  both 
transient  and  permanent  faults)  originating  in  systolic  systems.  A  new  generalization  has  been 
developed  that  extends  the  application  of  low-cost  inverse  residue  codes  into  two  dimensions: 
row  (byte)  and  column  (line)  residues.  [AVIZ  83).  This  extension  improves  the  detection  of 
errors,  especially  of  those  due  to  indeterminate  faults,  and  provides  certain  error-correction 
capabilities.  Previous  research  investigated  the  advantages  offered  by  two-dimensional  inverse 
residue  codes  in  the  detection  and  correction  of  errors  that  affect  byte-wide  communication 
paths  and  systolic  processing  elements.  Such  paths  are  widely  used  in  high-performance 
systolic  arrays  and  for  inter-processor  communication  in  large  multi-array  systems. 

In  general,  it  has  been  shown  that  the  remaining  undetectable  errors  in  the  message  X  are 
those  that  are  missed  by  both  checks:  modulo  2s -1  over  the  bytes  (not  including  the  check  line 
bits),  and  modulo  2*+1-l  over  the  lines,  with  the  check  byte  bits  included  in  each  line.  Most 
unidirectional  errors  are  detectable;  furthermore,  the  detection  of  bidirectional  errors  is 
significantly  improved.  A  single-line  correcting,  double-line  detecting  property  was  also 
demonstrated  for  unidirectional  errors. 

The  research  has  led  to  the  development  of  the  fundamental  byte-serial  arithmetic 
algorithms  for  operands  encoded  in  two-dimensional  low-cost  inverse  residue  codes.  The 
algorithms  are: 

(a)  the  line-residue  checking  algorithm; 

(b)  the  additive  inverse  (complementation)  algorithm; 

(c)  the  addition  algorithm. 

The  details  of  the  algorithms  have  been  presented  in  [AVIZ  85].  It  has  been  shown  that  byte- 
serial  arithmetic  can  be  carried  out  with  operands  which  are  encoded  in  two-dimensional  residue 
and  inverse  residue  codes.  Two-dimensional  encodings  provide  a  very  powerful  error-detecting 
and  a  substantial  error-correcting  capability  for  byte-serial  arithmetic.  Promising  application 
areas  are  systolic  arrays,  multiple-precision  arithmetic,  and  high-speed  array  computing. 
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Abstract 

This  paper  describes  a  method  based  on  applicative  languages  for  the  specification,  evaluation 
and  synthesis  of  hardware  algorithms.  The  goal  of  the  research  effort  is  to  provide  designers  with  an 
environment  in  which  chey  can  rapidly  explore  alternative  designs  for  their  algorithms  throughout  the 
synthesis  process.  It  is  possible  to  specify  the  algorithm  at  arbitrary  levels  of  abstraction  and  have  the 
system  rapidly  evaluate  certain  parameters  (e.g.  speed,  area,  ete.)  so  that  designers  can  make  informed 
decisions  during  the  synthesis  process.  Layouts  which  are  suitable  as  floor  plans  are  extracted  from 
high-level  algorithms. 


I  Introduction 

The  complexity  of  VLSI  design  can  only  be  managed  by  the  application  of  CAD  tools  at  all  lev- 
els  of  the  design  process.  In  order  to  be  effective,  these  tools  must  be  flexible  enough  to  be  tailored  to 
any  specific  design.  Generally.  VLSI  CAD  tools  may  be  distinguished  as  being  of  either  or  both  of  two 
types:  bottom-up  composition  tools  or  top-down  synthesis  tools.  For  bottom-up  composition  tools,  the 
user  either  exactly  specifies  the  placement  of  modules  and  the  interconnections  between  them,  or  relin¬ 
quishes  control  over  the  layout  to  the  tool's  algorithm.  Examples  of  composition  tools  are  graphic  lay¬ 
out  editors  (e.g.  Caesar,  Magic)  (Ousterhout8l,  Ousterhout84|  and  placement  and  routing  tools 
fRivest82].  Top-down  synthesis  tools  are  capable  of  generating  layouts  from  high-level  specifications. 
Examples  would  include  various  register-transfer  silicon  compilers  that  have  been  proposed  and  built 
(SislrindS2,  Direcior81,  Johannsen79).  Generally,  these  tools  do  not  provide  any  estimate  of  the  area  or 
delays  of  the  circuit  during  the  synthesis  process.  That  is.  designers  do  not  know  the  effects  of  their 
decisions  on  the  performance  until  the  design  is  complete. 

Many  of  the  current  design  approaches  were  largely  developed  for  SSI/MSI  technologies  and  are 
limited  because  of: 

Lack  of  paradigms  to  deal  with  topological  and  geometrical  aspects  of  algorithm  design  in  a 

hierarchical.  muJti-level  fashion. 

Lack  of  adequate  methods  to  deal  with  communication  requirements  of  VLSI  implementations 

during  a  multi-level  algorithm  design  process. 

Lack  of  an  adequate  interface  to  lower-level  VLSI  CAD  tools:  most  systems  require  a  logic 


f  Appeared  in  the  Proceedings  of  the  1985  Functional  Programming  Languages  and  Computer 
Architecture  Conference.  Nancy,  France,  J.P.  Jouannaud  (Ed.)  Lecture  Notes  in  Computer  Science  201. 
Springer-Verlng,  1985,  pp.  238-255. 


diagram  at  the  entry  level,  thus  forcing  designers  to  c'ope  with  details  which  are  apt  to  be 
changed  later  in  the  implementation  process. 

Lack  of  visual  feedback:  graphical  representations,  generated  automatically  from  a  high-level 
algorithm,  showing  details  selected  by  the  designer  are  highly  desirable. 

This  paper  describes  a  method  based  on  applicative  languages  [Backus78]  for  the  specification, 
evaluation  and  synthesis  of  hardware  algorithms.  This  method  is  supported  by  a  set  of  tools  that  is 
being  developed  at  UCLA.  The  goal  of  this  effort  is  to  provide  designers  with  an  environment  in  which 
they  can  rapidly  explore  various  altemadve  designs  for  their  algorithms.  Thus,  it  is  possible  to  specify 
the  algorithm  at  any  arbitrary  level  of  abstraction  and  have  the  system  rapidly  evaluate  performance 
parameters  (e.g.  speed,  area,  etc.)  so  that  designers  can  make  informed  decisions  during  the  synthesis 
process.  The  advantage  of  using  an  applicative  language  is  that  it  des  together  the  specificadon  of  the 
algorithm,  the  synthesis  of  the  circuit  and  the  evaiuadon  of  the  implementation. 

Others  have  explored  incorporating  applicative  languages  in  VLSI  design  and  have  shown  them 
to  be  viable.  Lahti  [Laha81]  used  an  applicative  language  to  describe  various  combinational  hardware 
structures.  Johnson  (Johnson 8a)  utilised  a  demand-driven  applicative  language  to  describe  and  syn¬ 
thesize  sequential  digital  circuits.  Carde Lli  and  Plotkin  [CardelliSl]  take  a  formal  approach  to  describ¬ 
ing  sequential  circuits  with  an  emphasis  on  verification.  Meshlrinpour  [Meshlrinpour85]  and  Sheeran 
(Sheerui84)  extended  Backua’  FP  language  with  operators  to  hsndle  sequential  circuits. 


2  Brief  Introduction  to  «FP 


vFP  extends  the  language  FP  proposed  by  Backus  (Backus78)  with  additional  functional  forms 
and  primitives.  In  contrast  to  uFP  (Sheeran84),  which  extends  FP's  semantics  to  operate  on  streams, 
the  semantics  of  vFP  are  the  same  as  those  of  FP  when  it  is  used  to  specify  algorithms.  A  program  in 
vFP  (as  in  FP)  is  a  function  that  maps  objects  into  objects.  Objects  are  either  atomic  (numbers  or 
strings)  or  sequences  of  objects.  The  distinguished  atom  denotes  an  undefined  value.  By  definition, 
any  sequence  which  contains  as  an  element  is  itself  undefined  and  thus  equal  to  .  The  primitive  func¬ 
tions  of  vFP  consist  of 

(U)-*6  *  :  (3.2)  — *  6 

:  (1.0)  -»  0  org  :  (0.0)  -*  0 

<U)-»F  -  (3J)-T 

:  (2,(4J),fi.(8.(9.lO)))  -*6  last :  (1,4,6)  -»  6 


arithmetic  functions, 
logical  functions, 
predicates, 
selector  functions, 
and  structure  modifying  functions. 

trans  :  ((1  JJ).(4J.6))  -♦  <(1.4),(2J).(3.6)) 
distl :  (x,  (aub.c))  -♦  ((xj),(x,b),(x,c)) 


andg  : 
atom  : 
3  : 


apndl  :  (1, (2.3,4)) 
discr  :  ((a.b.c).x)  - 


-U. 2.3.4) 
((ajt).(bjO.(cjt)) 


Functional  form*  are  used  to  combine  primitive  functions  into  more  complex  functions, 
compose  (f  (®  g) :  x  -♦  f :  (g  :  x) 
construct  (f.gjsl  x  -*  <f:x.  g:x.  h:x) 
apply  to  ail  &f  :  (p,q,r)  -*  <f:p,  f:q,  f:r) 
constant  %k  :  x  -»  k  if  x  is  not 
right  insert  '  f  :  (x,,...x8)  -*  f:(x,,  '  f:(xj,...x,j) 
tree  insert  f :  (x,,...x,J  -*  f:(  f:(x,....x  p.  f:( * 


V.V,  f. 


A  major  syntactic  difference  between  vFP  and  Backus'  FP  is  that  parameters  to  functions  may 
be  named  and  then  referred  to  in  the  function  body  with  the  same  restrictions  as  described  in 
(Backus#  1  ].  In  addition,  the  arithmetic,  logical,  and  predicate  functions  may  be  used  either  in  a  prefix 
or  an  infix  manner.  This  improves  the  readability  of  hardware  specifications.  For  example,  the  follow¬ 
ing  definidon  of  a  FuilAddtr  ui  Backus’  FP. 

FulLAdder  - 

(org<§(org<§(andg@(  1 ,2]4rdg(§>(2.3 ]],andg@(  1.3|  1,  xorg@(  1  _xorg<§>(2.3]]] 
could  alternatively  be  written  in  vFP  as 
defun  FulLAdder(a,b.Cin) 

(((a  andg  b)  org  (b  andg  Cin))  org  (a  andg  Cin),  a  xorg  (b  xorg  Cin)| 
enddef 

Owing  to  the  natural  specification  of  parallelism  in  FP-like  languages,  they  are  suited  to  describ¬ 
ing  parallel  hardware  algorithms.  These  specifications  are  executable.  Since  such  programs  are 
rtftrmiiaUy  transparent,  it  is  possible  to  have  an  algebra  of  programs  which  may  be  used  to  reason 
about  their  behavior.  These  methods  may  be  used  in  conjunction  with  each  other  to  convince  the 
designer  that  the  program  implements  the  envisioned  algorithm.  Specifications  can  also  be  executed 
symbolically  using  a  symbolic  input  during  which  it  is  possible  to  extract  the  topological  structure  of 
the  algorithm.  Therefore,  there  is  a  direct  relationship  between  the  structure  of  an  algorithm  written  in 
vFP  and  the  planar  topology  of  its  layout. 


3  Algorithm  Synthesis  in  vFP 

The  designer  first  specifies  the  algorithm  in  vFP  The  algebra  of  vFP  programs  may  be  used  to 
reason  about  the  algorithm.  In  addition,  the  specification  may  be  executed  with  sample  data  to  validate 
the  program. 

e 

vFP  can  be  used  to  describe  circuits  at  vinous  levels  of  abstraction.  Designers  are  free  to 
choose  whichever  level  is  “best"  for  their  current  purposes.  At  some  higher  level  of  abstraction,  the 
structure  of  the  FulLAddsr  may  not  be  relevant,  and  thus  the  definitions  given  earlier  would  suffice  to 
descnbe  its  behavior.  At  a  lower  level  of  abstraction,  where  the  structure  of  a  function  is  to  be  con¬ 
sidered.  an  alternative  definition  which  has  a  different  strucrure  may  be  substituted.  For  example,  the 
FuilAdder  could  be  defined  in  terms  of  HalfAeLdtn. 

defun  FulLAdderfa,b,Cin) 

(l  or  2,  31  @  apndl  (§  (l,  HaifAdder<§(2,3]]  @  apndr  (®  [HalfAdder<§>(i,b).  Cin| 

enddef 

defun  HalfAdderfa.b)  [a  andg  b.  a  xorg  b]  enddef 

Transformations  may  be  used  to  refine  the  program  to  whatever  level  of  detail  is  required.  In 
this  way  it  is  possible  to  first  specify  the  algorithm  at  a  level  of  abstraction  that  is  high  enough  to  aid 
validation  and  then  refine  it  to  the  level  at  which  it  can  be  easily  implemented. 


4  The  Evaluation  of  vFP  Algorithms 


It  i*  possible  to  tag  selected  user-defined  functions  so  that  when  a  vFP  specification  is  executed 
an  estunate  of  the  performance  of  the  algorithm  can  be  provided.  Tagging  a  function  tells  the  system 
that  this  is  a  function  of  interest  at  the  current  level  of  abstraction.  As  the  execution  proceeds,  the  inter¬ 
preter  keeps  crack  of  the  level  at  which  a  tagged  function  is  executed. 

The  level  of  a  tagged  funedon  is  defined  as  one  plus  the  maximum  of  the  levels  associated  with 
the  atoms  in  its  input  object.  The  level  of  each  atom  is  inidally  zero.  Each  time  a  tagged  funedon  is 
encountered,  its  level  is  determined  and  is  assigned  to  the  atoms  in  its  output  object.  However,  there  is 
a  problem  when  a  tagged  funedon  occurs  within  another  tagged  funedon.  In  this  case  the  level  of  the 
inner  function  is  determined  with  respect  to  the  outer  funedon  resulting  in  a  hierarchy  of  levels.  This  is 
accomplished  by  assigning  the  level  zero  to  each  atom  of  the  outer  funedon's  input  object,  and  comput¬ 
ing  the  level  of  tagged  functions  as  before  until  the  computation  of  the  outer  funedon  has  been  com¬ 
pleted.  The  level  of  the  outer  funedon  and  the  atoms  in  its  output  object  is  determined  as  before  and 
hence  is  independent  of  whether  or  not  any  tagged  funedon  occurs  within  the  outer  function.  Levels  are 
used  to  predict  the  speed  at  which  the  circuit  would  perform,  to  obtain  an  idea  of  where  the  parallelism 
in  the  algorithm  is.  and  to  get  an  estunate  of  the  area  that  would  be  occupied  by  the  circuit  A  better 
estimate  of  the  area  is  obtained  by  methods  mentioned  in  the  next  section. 

This  capability  of  having  the  system  estimate  performance  parameters  is  useful  m  tradeoff  ana¬ 
lyses.  For  example,  consider  the  following  funedon: 

l  if  a=*b-l)  mod  8 
z  ■  '  2  if  a*b 
0  otherwise 

schemeA  and  schcmeB.  below,  are  two  algorithms  for  implementing  the  function.  If  the  boolean  func¬ 
tions  (aitdg,  org,  notg,  and  xorg)  are  tagged,  the  results  shown  in  Figure  1  are  obtained. 

*  scheme  A  inputs  :  ((a)  (b))  outputs  :  (zl  zO) 
defun  schemeA 

A(org<@Aandg<9trans)<Sdistl@(l,(idjT3trI@2]0&decoder 

enddef 

*  scheme  8  inputs  :  ((a)  (b))  outputs  :  (zi  zO) 
defun  schemeB(a.b) 

^compare  @  distl  <§  (a.  [b.  ti  @  adder  (b.  (%1.  %\,  <M]]]) 
enddef 

defun  compare  notg<®org@<&xorg<®trans  enddef 
defun  adder 

apndl<®(  xorg<$(  xorg<$  1 , 1  ($2],tl(f>2] 
t®(l.!add@apndr<®[tlrJialfAdder<®lastl(®ill  (®  trans 
enddef 

defun  add(a.b)  concat(®(TuUAdder(5>apndr(a>(a.l(g>b|,tl(3>bi  enddef 


*  statistics  for  schemeA 


*  statistics  for  sc  heme  B 


level 

Andg 

Org 

Notg 

1 

2 

6 

2 

10 

3 

14 

4 

14 

5 

8 

'6 

4 

7 

2 

Totals 

40 

14 

6 

level 

Andg 

Xorg 

Org 

Notg 

I 

2 

6 

1 

2 

1 

3 

1 

2 

4 

1 

1 

5 

l 

6 

1 

7 

1 

8 

1 

Totals 

3 

11 

5 

1 

Figure  1:  A  comparison  of  two  implementations 

The  results  in  Figure  1  show  that  schemeA  uses  a  total  of  60  gates,  while  schemeB  uses  a  total  of  21 
gates.  However,  it  is  to  be  noted  that  1 1  of  the  21  gates  are  xor  gates  which  would  normally  occupy  a 
larger  area.  Given  an  estimate  of  the  area  occupied  for  each  of  the  gates,  it  is  possible  to  have  an  esti¬ 
mate  of  the  area  occupied  by  each  implementation.  Since  schemeA  has  7  levels  while  schemeB  has  8. 
schemeA  would  be  faster  than  schemeB  under  the  assumption  that  all  the  tagged  functions  had  the  same 
delay. 
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In  addition  to  the  time  and  space  estimates  provided  by  the  level  mechanism,  the  system  can  be 
extended  to  allow  the  specificaaon  and  calculaaon  of  user-specified  parameters  for  each  tagged  function 
and  for  the  algorithm  as  a  whole. 


5  Space  Domain  Implementations  of  vFP  Algorithms 

A  vFP  algorithm  can  be  mapped  into  a  structure  corresponding  to  a  combinational  network  by 
passing  symbolic  inputs  to  functions  which  in  turn  generate  symbolic  outputs.  The  unit  of  information 
represented  by  a  symbolic  atom  can  be  anything  corresponding  to  the  level  of  abstraction.  Thus,  a  sym¬ 
bolic  atom  may  represent  a  wire,  a  set  of  wires,  a  bit  vector,  or  an  integer,  as  required.  An  acyclic  com- 
putanon  graph  with  vFP  primitives  as  nodes  is  obtained  by  tracing  the  application  of  a  function  to  a 
symbolic  input.  This  computation  graph  can  be  transformed  into  a  layout  using  techniques  described 
later.  By  tagging  the  appropriate  functions,  the  layout  may  be  generated  at  any  desired  hierarchical 
level.  For  example.  Figure  2  shows  a  layout  of  a  FullAdder  using  and,  or,  and  xor  gates;  whereas  Fig¬ 
ure  3  shows  the  same  FullAdder  as  being  composed  of  Half  Adders. 

The  mapping  from  a  vFP  algorithm  to  a  combinational  nerworlt  is  allowed  under  the  following 
restrictions.  Functions  like  iota,  whose  output  structure  depends  on  an  input  value,  cannot  be  laid  out 
In  addition,  during  symbolic  evaluation,  the  predicate  part  of  a  conditional  must  be  evaluable  to  a 
boolean. 
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Figure  2:  The  structure  of  a  FuilAdder 
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Figure  3:  A  FuilAdder  using  HalfAdden 


As  in  (iFP  (Sheeran84].  structural  iterations  over  the  input  of  the  circuit  can  be  handled  by  the 
insert  and  apply-to-all  functional  forms.  Other  rypes  of  structural  recursions  are  allowed  in  vFP  since 
the  conditional  funcnonal  form  is  treated  as  a  structural  form  for  the  purposes  of  layout  Depending  on 
the  value  of  the  predicate  of  the  conditional,  either  the  consequent  or  the  alternate  pan  will  be  evaluated 
symbolically  for  its  structure  but  no  structure  will  be  generated  for  the  predicate  part  A  new  primitive 
called  sw  (for  switch)  is  provided  m  vFP  which  corresponds  to  the  conditional  form  in  ufP  This  prim¬ 
itive  takes  three  arguments.  If  the  fini  is  1  then  the  output  is  the  second  argument:  if  it  is  0  then  the 
output  is  the  thud  argument  else  it  is  .  In  addidon,  it  is  required  that  the  structures  of  the  second  and 
thud  arguments  be  the  same. 

A  vFP  description  of  a  circuit  can  be  generic  in  the  sense  that  the  description  is  independent  of 
the  input  dimensions  .of  the  circuit.  For  example,  there  needs  to  be  only  one  description  of  a  decoder 
This  same  description  works  for  a  decoder  independent  of  the  number  of  inputs.  The  3-to-8  decoder 
shown  in  Figure  4  is  obtained  by  evaluating  the  description  of  the  generic  decoder  with  a  symbolic 
argument  of  size  3.  Figure  4  shows  how  the  generic  iterative  decoder  is  formed  by  first  applying  I-to-2 
decoders  ( Dec! )  to  the  inputs  and  then  inserting  the  function  DecStage.  DecStage  takes  an  n-to-2“ 
decoder  and  a  new  input  to  make  a  (n+l)-to-2®*'1  decoder, 
defun  Decoder  'DecStage  (S>  &Decl  end def 

defun  DecStage  dbanr'g  @  concat  @  jedisti  @  distr  enddef 

defun  Dee  1  [noud]  enddef 

As  before,  these  implementations  may  be  evaluated  to  get  speed/area  estimates,  but  now,  since  routing 
is  taken  into  account,  a  better  estimate  of  area  can  be  provided. 

Cell  iterative  networks  are  combinational  circuits  which  are  formed  by  interconnecting  a  particu¬ 
lar  cell  in  a  regular  pattern.  Although  combinational  circuits  without  feedback  can  be  described  in  vFP 
using  the  forms  inherited  from  FP,  some  additional  functional  forms  are  provided  to  give  designers 
more  control  over  exaedy  how  cell  iterative  networks  are  to  be  laid  out.  These  networks  are  thus 


system  in  vF P  D"1  is  a  phantom  element  that  corresponds  to  an  inverse  time  delay.  It  ts  used  to  Veep 
track  of  the  number  of  clock  pulses  the  output  is  going  to  be  delayed  by.  This  information  ts  needed  by 
the  construct  functional  form  to  synchronize  iu  components  since  the  semantics  of  the  construct  require 
that  the  outputs  of  its  elements  appear  together.  Generally  the  D'1  elements  are  moved,  via  transforma¬ 
tions.  to  the  outputs  of  the  circuit  where  they  serve  to  denote  the  delay. 

When  elements  of  a  sequence  are  available  serially  in  time  along  the  same  wire(s),  it  is  neces¬ 
sary  to  know  when  each  element  is  valid.  This  is  accomplished,  during  symbolic  simulation,  by  having 
each  symbolic  item  carry  the  name  of  a  clock  with  it  It  is  assumed  that  its  value  will  be  stable  before 
every  tick  of  the  named  clock.  The  system  will  automancally  widen  the  intervals  between  clock  ticks  to 
ensure  that  this  is  true.  Initially,  all  the  inputs  are  associated  with  the  same  clock.  Each  combinational 
element  ’will  assign  to  its  output  the  clock  associated  with  its  input.  If  there  are  n  elements  to  the  input 
sequence  of  a  SOP!,  then  each  of  its  output  elements  will  be  clocked  by  the  clock  nC,;  and  conversely 
for  a  POSl.  A  clock  named  nCt  denotes  a  clock  which  has  <t  clock  ticks  in  between  consecutive  ticks 
of  the  clock  named  Ct.  Though  the  descripoon  of  a  SO  PI  or  POSI  is  generic,  the  value  of  n  (the 
number  of  elements  us  the  sequence)  must  be  known  at  layout  time. 


POSI  <3  D'1  <3  SOPI  ■  SOP!  <3  D’1  <3  POSI  ■  id 
f  (3  D"‘  ■  D*1  <3  f 


(Ur  ,  last)  (3  jpndr  ■  apndr  (3  (tlr  ,  last]  a  id 

the  program  may  be  transformed  into  the  following 

D_l  <3  !t*  (3  apndr  (3  (4cT«  <3  SOPI  @  dr .  •  <3  last] 

whose  layout  is  shown  in  Figure  8.  The  single  D-1  element  denotes  that  the  output  is  delayed  only  one 
clock  tick  from  the  input. 

Thu  implementation  accepts  all  its  inputs  simultaneously  and  eventually  gives  its  result.  It  will 
only  work  for  input  sequences  of  one  particular  length,  since  only  fixed  size  SOPI s  can  be  laid  out 
However,  if  each  element  of  the  input  sequence  was  input  serially  to  the  implementation,  a  correspond 
mg  POSI  could  be  introduced  at  the  input  and  then  used  to  transform  out  the  SOPI  that  exists  in  the 
current  implementation.  This  would  make  the  implementation  generic  in  the  sense  that  it  would  be  able 
to  handle  arbitrary  length  sequences  as  its  inputs. 


7  Layouts  from  vFP  Specifications 

In  this  section  the  mapping  from  vFP  algorithms  to  layouts  is  briefly  described.  A  more  detailed 
exposition  and  a  description  of  its  implementation  can  be  found  in  [Schlag84|.  The  intent  of  this  sys¬ 
tem  is  to  provide  the  vFP  designer  with  an  interactive  environment  in  which  the  design  can  be  viewed 
as  it  is  constructed.  The  mapping  from  vFP  is  actually  the  composition  of  two  mappings  An  vFP  func¬ 
tion  is  tint  mapped  to  an  intermediate  form  (IF)  which  reflects  the  planar  topology  of  the  function  and 
then  this  IF  is  mapped  to  fixed  geometry  by  selecting  and  resolving  relative  position  constraints  (com¬ 
paction). 

The  rauonale  for  dividing  the  mapping  m  two  steps  is  the  observation  that  a  certain  portion  of 
the  mapping  from  vFP  should  be  functional  even  though  the  entire  map  cannot  be.  That  is.  a  particular 
vFP  function  applied  to  a  particular  symbolic  object  should  define  an  IF  uniquely,  while  the  geometry  of 
the  function  should  depend  on  its  environment.  Fixing  the  geometry  of  a  sub-function  may  create  wiring 
and  shape  incompatibilities  with  other  sub-functions  which  would  require  additional  area  to  resolve. 
Functionality  has  two  advantages. 

1.  The  mapping  can  be  implemented  as  an  application  of  an  vFP  function  to  an  object 

2.  Algebraic  transformations  on  the  vFP  function  have  predictable  effects  on  the  IF. 

The  extraction  of  the  topology  (IF)  of  an  vFP  expression  is  implemented  as  an  interpreter.  A  function 
applied  to  an  object  generates  an  OF  and  each  combining  form  dictates  a  topological  organization  of  die 
CFs  of  its  sub-functions.  The  routing  is  the  direct  result  of  the  routing  primitives  and  the  combining 
forms  of  the  function  This  implementation  generates  a  sketch  of  the  vFP  specification  in  terms  of 
"boxes"  and  " wires"  by  symbolically  tracing  the  vF?  function  and  representing  each  3tom  as  a  wire. 
The  level  of  abstracaon  of  the  sketch  can  be  controlled  by  selecting  which  functions  io  represent  as 


boxes  and  what  objects  each  atom  represents.  The  IF  consists  of  a  list  of  horizontal  cross-sections  each 
of  which  is  a  left  to  right  ordering  of  the  “boxes"  and  “wires’'  which  intersect  the  cross-section.  Any 
cross-overs  are  represented  explicitly;  each  cross-section  corresponds  to  a  horizontal  track.  The  IF  gen¬ 
erated  by  the  interpreter  is  fed  to  a  program  which  resolves  the  cross- sections  using  horizontal  compac¬ 
tion  and  displays  the  sketch  on  a  graphics  terminal. 

An  example  is  presented  to  illustrate  how  vFP  facilitates  the  transition  from  algorithm  to  imple¬ 
mentation  taking  into  account  the  layout.  The  vFP  specification  of  a  carry  chain  adder  [Brent80]  is 
considered.  The  specificaoon  is  generic  in  that  it  adds  rwo  bit  vectors  of  length  2”  for  n  >0.  The  input 
consists  of  the  2"  pairs  of  bits  to  be  added  with  the  leftmost  pair,  containing  the  least  significant  bits. 

((a  ,,i  5,hs) . (ay,bz.)). 

For  1S1S/S2'’. 

1  if  a  carry  into  column  i  would  I  if  adding  columns  i  through  j 

P,j  »  '  propagate  as  a  carry  out  of  column  j  and  G-,j  *  '  causes  a  cany  out  of  column  j 
0  otherwise  0  otherwise 

The  computation  is  performed  by  computing  the  carries  for  each  column,  G and  then  obtaining  the 
sum  bit  using, 

r,-  =sO  \  j i  j  for  I  <i  1x2  ,  s  ^P  u  ,  and  s  z*  [7.  I J 

P  |j  and  G are  computed  for  each  1  by  using  the  following  identities,  implemented  by  the  function 
PG. 

For  iSj<h,  P.^P.jPj^j,  and  P, [7.21 

The  initial  PtJ  and  C,_,  are  computed  by  the  function  PG1. 

P,  j=*i  ■  Oi^bi  [7.3] 

The  computation  of  P  (j  and  G  u  is  achieved  in  two  steps  by  the  function  getcarries.  The  following  is 
the  specification  of  getcarries. 

*  input  -  ((aO,bO),(al,b  I),(a2,b2] . (a2**n  -  l.b2**n  •  l)) 

defun  getcarries  secondhalffSspliK®  l@firstha!f@&PG  1  enddef 

defun  flrsthalf  '/eq@[!ength.%l]  then  id  the  finihaif(3)&stagel(®pair./i  enddef 

defun  stagel  concat<®[&D(a>l,&D(®tlr@2.[PG<$(lasi@Uast@2]])  enddef 

defun  secondhalf  i/eq($(length,%l]<®l  then  done  else  secondhaif($concai(® 

(split@<fcD<a>  l  ,sttge2(S>ti]<2>apndr@[concat(g>&(iii2a3t|(g>tir.I  astj  fi 
enddef 

defun  stage2  concatt® 

&([apndr(®[&D(®tlr(S>l(3)2.PG<®(  Uast(S>  l($21],<fcD(a>2(®2I(3>(  l.splii(a>2])<3>pair 
enddef 


defun  D  id  enddef 


defun  don*  id  enddef 

Fine  %ttcarnts  computes  (P,a,G,j)  by  applying  PG1  to  the  two  bits  tn  each  column  and  then  it  applies 
flrsthalf.  flrsthalf  compute*  f or  each  column  i  *<2/n*l)2*  where  m  is  an  integer. 

This  is  accomplished  by  arranging  each  column  (i.e.  its  pair  (P  ,G ))  in  a  group  of  its  own  and  then 
recursively  applying  the  function  stagel  to  pairs  of  groups  until  only  a  single  group  remains,  stagcl 
combines  a  pair  of  groups  computing  a  new  (P  ,G)  for  the  last  column  of  the  second  group  by  applying 
PG  to  the  last  columns  of  the  two  groups:  the  pair  of  groups  is  then  concatenated  to  form  one  group.  All 
other  columns  are  unchanged:  the  function  D  which  is  given  the  definition  id  is  applied  to  them.  When 
all  columns  are  in  a  single  group  getearrie*  applies  the  function  secondhalf  to  compute  the  final 
{P  .GY s.  secondhalf  is  also  recursive,  terminating  when  each  column  is  in  a  group  by  ttself.  At  each 
step  the  final  (P  ,G  Vs  of  decreasing  multiples  of  powers  of  2  are  computed.  Assume  that  in  the  previous 
step  P  Xj  and  G  Xj  have  been  computed  for  each  column  i*m2*.  In  the  neat  step  to  compute  the 
(P  ,GY s  of  columns  which  are  multiples  of  2**1,  it  is  necessary  only  to  compute  new  [P  .GY s  for 
columns  i*<2m  +  l)21*1  *"i2**2*'1,  the  odd  multiples  of  2*"1  The  current  (P.G)  m  column  i  is 
(P \j,G  Xj)  can  be  obtained  by  applying  PG  to  the  current  (P.G)  and 
(P  i/nf,G  ljwy).  Initially  the  columns  are  divided  into  two  groups  and  since  flrsthalf  computed  the 
final  (P  .GY s  for  powers  of  2,  the  last  column  (a  multiple  of  2*'1)  has  its  final  value.  At  each  step 
secondhalf  duplicates  the  last  column  from  each  group  and  then  applies  stage2  after  removing  the  first 
group,  stagei  takes  each  group,  splits  it  into  two  and  computes  new  IP  ,G  Vs  for  the  last  column  in  the 
left  group  of  each  new  pair  using  the  duplicated  column  immediately  to  (he  left  of  the  group.  The  first 
group  is  then  appended  to  the  result  of  stag«2. 


Figure  9:  The  sketch  of  getearrie*  with  each  tP.G)  as  a  wire 


Figure  9  is  the  sketch  of  getearrie*  in  which  the  pair  (P  .G)  for  each  column  is  represented  by  a 
single  wire.  This  is  accomplished  by  directing  the  interpreter  to  draw  PGl.  PG  and  D  as  bones  and  by 
giving  PGl  and  PG  symbolic  dehnioons. 

(define-symbolic  PGl  input»(a  b)  output  »  (c)  ) 

'define-symbolic  PG  inpui-ta  b)  output  »  c  i 


(drawbox  PG1  label-PGl  ht-2) 
(drawbox  PG  tabel-PG  ht-2) 
(drawbox  D  label-D  ht-2) 


a\b\  <*  i  f>  2  bj  a4  b4  a }  b  j  af  bt  a  7  b7  a  t  b  j 


Figure  10:  The  sketch  of  getcarries  with  the  definitions  of  PG1  and  PG  filled  in 


Figure  10  is  the  sketch  of  getcarries  with  each  wire  corresponding  to  a  bit  this  time  and  with  the 
specifications  of  the  functions  PG  and  PG1  "filled  in."  The  definition  of  done  is  changed  since  only 
the  G  of  each  column  is  required  to  compute  the  final  sum.  Notice  that  the  layout  interpreter  generates 
only  those  wires  and  boxes  which  have  paths  to  an  output.  The  previous  specificaoon  would  be 
extended  as  follows. 

defun  done  &(2<Sl )  enddef 
defun  PG1  [[xorgjndg]]  enddef 

defun  PG  [andg@(l<?l,l@2]>org@[andg@[2@l,l@2],2@2]]  enddef 
(drawbox  0  label-D  ht-2) 

To  obtain  the  final  sum  by  (7.1)  it  is  necessary  to  combine  the  first  P  in  each  column,  PtJ  with  the  C  of 
its  left  neighbor  G  One  way  of  doing  this  would  be  to  duplicate  each  P, ,  generated  by  PG1.  route 
them  along  the  side  and  then  merge  them  back  into  the  columns  to  compute  the  final  sum  as  in  Figure 
11.  The  additional  area  required  for  routing  makes  this  an  unattractive  alternative.  A  better  design 
would  be  to  route  the  PtJ  along  with  the  (P  ,C)  down  its  own  column.  This  extension  is  easily  handled 
m  vFP  by  modifying  the  function  PG  so  that  it  duplicates  its  first  argument  if  it  receives  only  rwo  argu¬ 
ments  m  a  column  and  simply  passes  on  the  extra  argument  otherwise.  The  specificaoon  is  modified  as 
follows. 


-IP- 


Figure  1 1:  An  inefficient  design 


defun  add 

concat<3)([  1  ].&xorg<®pairt3ltl@tlr.(last|] 

@concat@apndl@(  1  <3>  l  ,&([  1 ,3]@  l  X®tl](®getcames 
enddef 

defun  PG 

apndl<3l[D@l<3>2,  i/null@U<a)ti@2  then  oidPG  else  oldPG<®[  l.ti@2]  ft  ] 

@  if  null@tl@tl@  1  then  id  els*  [  tl@  1 2]  ft 
enddef 

defun  oidPG  [andgd>[l<3>l,l<<S2],org@[andg<g>[2@l,l(®2],2(®2]J  enddef 
defun  done  id  enddef 
(drawbox  D  labei»D  ht-2) 

The  funcnon  add  applies  getcarries  and  then  handles  the  columns  according  to  (7.1)  to  obtain  the  final 
sum  bits.  Figure  12  is  the  sketch  of  add. 

Figure  13  is  the  sketch  obtained  of  a  carry  save  array  multiplier.  This  example  is  presented  to 
illustrate  the  geometric  flexibility  of  fixing  only  the  planar  topology  in  the  specification.  The  functions 
HA*.  FA*,  FA**,  and  HalfAdder  are  represented  as  primitives.  The  specificaaons  of  the  functions 
HA*.  FA*,  and  FA**  are 

*  FA*  op2  :  ((a  b)  (y  x))  — >  ((c  x)  s)  where  2c  *  s  -  (a  *  b  *  yx) 
defun  op2  ([org<®U,l<®21  Jl,2@2](® 

( I  (3>  1  _HalfAdder@(2@  1 ,2],3](®rHal/Adder(g>  l  ,andg(g>2,2@2] 
enddef 

*  HA*  opl  ((a)  (y  x ))  — >  ((c  x)  si  where  2c  »  s  »  (a  *  yx) 

defun  opl  ([l<3>l.2],2(a)l|@fHalfAdden3)(l(a)l,andg<Sl21.2(S)2]  enddef 


*  FA**  lop2  :  ((a  b)  (y  x))  -— >  (c  s)  where  2e  *  s  •  (a  *  b  yx) 
defun  lop2 

(org@(  1 . 1  @2],2(f  21<f(  I  <?  1  .Half  Adder@(2(|>  1 .21)<?rHalf  Adder®  t  ,andg@2) 
enddef 

defun  opO  (l.andg)  enddef 

Figure  14  contains  the  sketches  of  these  functions,  while  in  Figure  1J  the  same  carry  save  array  multi¬ 
plier  is  represented  in  terms  of  lower  level  primitives.  Note  chat  the  geometry  of  the  functions  HA*. 
FA*  and  FA**  vanes;  each  instance  has  some  flexibility  m  adapting  to  the  particular  geometric  con¬ 
straints  it  encounters. 

A  B  Y  X  AY  X  A  B  Y  X 


C  X  S  C  X  s 

Figure  14;  FA*.  HA*  and  FA** 


All  of  the  figures  in  the  paper  with  the  exception  of  Figures  J  and  6  were  generated  by  this  sys¬ 
tem.  It  is  limited  in  that  the  data  flow  is  vertical  with  a  function’s  inputs  and  outputs  on  the  top  or  bot¬ 
tom.  More  sophisticated  layout  techniques  which  would  not  suffer  from  this  limitanon  are  being  exam¬ 
ined.  However  this  system  is  useful  in  that  it  provides  visual  feedback  quickly  allowing  the  designer  to 
see  the  planar  implications  of  the  specification. 


8  Concluding  Remarks 

The  objective  of  this  research  is  to  develop  a  formal  high-level  language  approach  to 
specification,  simulation,  performance  evaluation,  and  chip  layout  planning  for  V"LSI  systems.  Our 
approach  takes  a  high-level  applicative  language  (vFP)  and  programming  style  as  its  basis.  The 
rauonales  for  using  vFP  and  and  its  potential  in  dealing  with  several  specificauon  and  implememauon 
aspects  are  the  subject  of  this  paper.  Specifically,  a  few  examples  have  illustrated  how  vFP  can  be  used 
to  specify  combinational,  iterative,  and  sequential  circuits.  User-specifiable  performance  parameters 
may  be  used  at  any  abstraction  level  to  provide  a  basis  for  making  design  decisions  during  the  synthesis 
process.  Layouts  which  arc  suitable  as  floor  plans  are  extracted  from  high-level  algorithms.  Currently, 
an  automated  attribute  system  is  under  development  More  sophisticated  lavout  techniques  and  topolog¬ 
ical  optimizations  are  being  examined,  as  are  techniques  to  handle  other  classes  of  sequential  circuits. 


Figure  13:  The  Cany  Save  Amy  Multiplier  with  FA*.  HA*  and  FA**  filled  m 


9  Acknowledgements 

This  work  was  supported  in  pan  by  ONR  Contract  N00014-83-K-0493.  and  by  Rockwell/UC 
MICRO  Grant  137.  The  presentation  and  content  of  this  paper  has  benefited  greatly  from  the  detailed 
comments  provided  by  one  of  the  referees. 


REFERENCES 


[Backu$781 


[Backus8l] 


(Brent80| 


John  Backus.  "Can  programming  be  liberated  from  the  von  Neumann  style’  A 
functional  style  and  its  algebra  of  programs.''  Communications  of  the  ACM  21(8). 
pp.  613-641  (August  1978)  1977  ACM  Tunng  award  lecture. 

John  Backus.  "The  Algebra  of  Functional  Programs:  Function  level  Reasoning, 
Linear  Equations,  and  Extended  Definitions.”  Proceedings  International  Cotloqium 
on  Formalization  of  Pro%rammtnf  Concepts  Lecture  Notes  in  Computer  Science 
*107,  pp.  1-43.  Springer  Verlag  (1981). 

R.  P  Brent  and  H.  T.  Kung,  "The  Chip  Complexity  of  Sinarv  Arithmetic.  ' 
Proceedings  12th  ACM  Svmposiu/n  on  the  Theory  of  Computing,  pp  190-200  (Mav 


(CardelliSl)  Luca  Cardelli  and  Gordon  Plotkin,  "An  Algebraic  Approach  to  VLSI  Design,"  pp. 

173-192  tn  VLSI  81  •  Very  Large  Scale  Integration  First  International  Conference 
on  VLSI.  ed.  John  P.  Gray  (1981). 

fDireetor81]  S.  Director.  A.  Parker.  D.  Siewiorek,  and  D,  Thomas,  "A  Design  Methodology  and 
Computer  Aids  for  Digital  VLSI  Systems,”  IEEE  Transactions  Circuits  and  Sys¬ 
tems  CAS*2S<7\  pp.  634-645  (July  1981). 

(Johannsen79|  David  Johannsen,  "Bristle  Blocks:  A  Silicon  Compiler.”  Proceedings  16th  Design 
Automation  Conference .  pp.  310-313  (June  1979). 

(Johnson84)  Steven  Johnson.  Synthesis  of  Digital  Designs  from  Recursion  Equations.  MIT  Press 

(1984). 

(LaJm8l|  D.  O.  Lahti.  “Applications  of  a  Functional  Programming  Language,”  Tech.  Rep. 

CSD-8 10*03.  UCLA  Computer  Science  Department.  Los  Angeles.  California. 
(April  1981). 

fMeshkinpour85]  F.  Meshkinpour  and  M.  D.  Ercegovac.  ”A  Functional  Language  for  Description  and 
Design  of  Digital  Systems:  Sequential  Constructs.”  proceedings  of  the  22 nd  Design 
Automation  Conference,  pp.  238-244  (June  23-26.  1985). 

(OusterhoutSl)  John  Ousterhout,  “Caesar  An  Interactive  Editor  for  VLSI  Layouts.”  VLSI  Design 
Q(4).  pp.  34-38  (fourth  quarter  1981). 

(Ousterhout84|  J.  K.  Ousterhout,  G.  T.  Hamachi.  R.  N.  Mayo,  W  S  Scott,  and  G.  S.  Taylor. 

“Magic:  A  VLSI  Layout  System.”  Proceedings  of  the  21st  Design  Automanon 
Conference,  pp.  152-159  (June  25-27,  1984). 

[Rivest82]  Ronald  L.  Rivest,  “The  'PI'  (Placement  and  Interconnect)  System.”  Proceedings  of 

the  1 9th  Design  Automation  Conference,  pp.  475-481  (June  1982). 

(Schlag84)  Marane  Schlag,  "Extracting  Geometry  from  FP  for  VLSI  Layout.”  Tech.  Rep. 

CSD-840043.  UCLA  Computer  Science  Department.  Los  Angeles.  California. 
(October  1984). 

(Sheeran84|  Mary  Sheeran.  "muFP,  a  language  for  VLSI  design,”  Proceedings  of  the  198 * 

ACM  Conference  on  Lisp  and  Functional  Programming ,  pp.  104-112  (August  6-8. 
1984). 

Jeffrey  Mark  Siskind,  Jay  Roger  Southard,  and  Kenneth  Walter  Crouch.  "Generat¬ 
ing  Custom  High  Performance  VLSI  Designs  from  Succinct  Algorithmic  Descrip¬ 
tions,”  1982  MIT  Conference  on  Advanced  Research  in  VLSI.  pp.  28-40  (January 
1982). 


[Siskind821 


A  FUNCTIONAL  LANGUAGE  FOR  DESCRIPTION 
AND  DESIGN  OF  DIGITAL  SYSTEMS: 
SEQUENTIAL  CONSTRUCTS 


F.  Meshkinpour 
M.D.  Ercegovac 


Reprinted  from  IEEE  PROCEEDINGS  OF  THE  22ND 
ACM/IEEE  DESIGN  AUTOMATION  CONFERENCE, 
Las  Vegas,  Nevada,  June  23-26,  1985 


A  Functional  Language  tor  Description  and  Deeign 
of  Digital  Systems:  Sequential  Constructs 


F.  Meshkinpour*  and  M.D.  Ercegovac 


Computer  Science  Depertment 
University  of  California  at  Los  Angeles 
Los  Angeles,  CA  90024,  USA 


Abstract 

A  functional  (applicative)  hardware  description 
language  (FHDL),  capable  of  dealing  with  both  the 
sequential  and  combinational  systems  is  discussed.  The 
language  supports  multi-level  executable  specifications 
and  interpretation  of  functional  specifications  as  imple¬ 
mentations  at  a  given  level  of  primitives.  That  is,  the 
FHDL  specifications  are  symbolically  interpreted  to  pro¬ 
duce  structural  representations  (implementations)  of 
hardware  algorithms.  The  symbolic  interpreter  presently 
implements  the  specification  of  hardware  algorithms  at 
the  gate  level.  The  FHDL  allows  definition  of  function 
attributes,  such  as  delay  and  number  of  logic  level  so 
that  the  performance  characteristics  of  implementations 
can  be  obtained  during  simulation. 


I.  Introduction 

Background:  High-level  hardware  description 
languages  (HDL)  are  used  in  various  phases  of  design  in 
order  to  reduce  the  design  time  and  errors,  and  simplify 
checking,  debugging  and  modification  of  specifications 
and  the  corresponding  implementations  (19].  The  high- 
level  HDLs  are  also  used  in  simulation  at  various  levels 
in  the  design  hierarchy.  Multi-level  simulation  is  a  very 
important  aspect  of  VLSI  design  because  of  the  lengthy 
manufacturing  process. 

The  HDLs  have  been  following  the  evolution  of 
programming  languages  in  the  sense  that  both  the  im¬ 
perative  (procedural)  and  applicative  (functional,  non¬ 
procedural)  languages  have  been  considered  as  models. 
The  HDLs  based  on  conventional,  imperative  languages 
have  several  serious  deffidendes:  they  have  no  rigorous 
basis,  their  syntax  and  semantics  are  complicated,  their 
constructs  are  ad  hoc,  and  they  reflect  closely  the 
sequential  mode)  of  computation.  Consequently,  the 
HDL  programs  tend  to  be  complex  and  error-prone,  dif¬ 
ficult  to  compose  out  of  other  programs,  provide  no  in¬ 
herent  basis  for  checking  and  verification,  and  do  not 
support  concurrency.  Moreover,  there  is  no  direct  corre¬ 
lation  between  a  high-level  specification  and  its  imple- 
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mentation  at  a  topological/geometrical  level.  Use  of  vari¬ 
ables  and  the  possibility  of  side-effects  make  the  analysis 
of  algorithms  and  their  implementations  very  difficult. 
The  chief  advantage  of  conventional  HDLs  is  a  common 
familiarity  and  wide-spread  use  of  conventional 
languages. 

Recognizing  the  potential  benefits  of  languages 
with  formal  foundations,  simple  and  precise  semantics, 
and  inherent  power  to  deal  with  concurrency  and  multi¬ 
level  abstractions,  several  researchers  have  considered 
nonconventional  language  approaches  for  specification 
and  design  of  digital  systems.  Functional  (applicative) 
programming  languages  [1,2]  satisfy  these  properties 
[13,6,17,11],  The  basis  of  functional  programming  (FP) 
style  is  the  representation  of  computations  by  functions 
that  map  objects  into  objects,  and  functional  forms  that 
combine  functions.  Objects  are  atoms  (e  g.,  numbers  and 
strings)  or  sequences  of  objects.  Since  an  FP  program  is 
a  function,  it  provides  a  generic  specification  of  a  compu¬ 
tation:  it  applies  to  any  size  of  the  input  object.  An  FP 
language  allows  hierarchical  description  of  computations. 
Since  the  language  does  not  have  side-effects,  the  compo¬ 
sition  and  analysis  of  programs  are  straightforward. 
[1,6,17]  A  possibly  the  most  significant  property  is  the 
mathematical  basis  of  FP  languages  which  provides 
means  for  systematic  program  transformations  and  for¬ 
mal  design  verification. 

The  FP  hardware  description  languages  provide  an 
integrated  framework  for  the  following  phases  of  design: 
(i)  Specification:  capturing  proper  behavior,  (ii)  Imple¬ 
mentation:  obtaining  a  suitable  structure  (implementa¬ 
tion),  and  (iii)  Optimization:  '*finement  of  the  imple¬ 
mentation  to  satisfy  realization  constraints. 

The  differences  between  functional  programming 
languages  (FP)  and  imperative  languages  as  general  pro¬ 
gramming  languages  have  been  discussed  in  depth  in. 
(1,3]  The  previous  work  on  the  use  of  functional 
languages  as  HDLs  by  (13,9,8,4,10,11,16,15] 
discusses  key  ideas  and  the  tradeoffs  of  this  class  of  pro¬ 
gramming  languages  compared  to  imperative  HDLs. 

A  functional  program,  being  an  expression,  is  clearly 
suitable  for  describing  combinational  networks.  Most  of 
the  previous  work  on  functional  languages  as  HDLs  dealt 
with  the  combinational  systems  only.  Recently  ap- 
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proaches  to  deal  with  sequential  networks  have  been  dis¬ 
cussed  in  [15,20] 

Overview  of  the  article:  The  principal  contributions 
discussed  here  are:  (i)  FP  language  extensions  to  deal 
with  sequential  networks,  and  (ii)  an  attribute  system  to 
deal  in  a  general  way  with  the  evaluation  of  characteris¬ 
tics  of  designs.  Section  2  discusses  the  FP  language 
(FHDL),  its  sequential  constructs,  and  the  handling  of 
attributes.  The  main  features  that  support  transforma¬ 
tions  from  the  algorithmic  level  into  the  logic  design  level 
are  emphasized.  Several  functional  forms  have  been  ad¬ 
ded  to  support  the  specification  of  sequential  systems. 
We  illustrate  these  forms  in  both  the  behavioral  and 
structural  domains. 

To  aid  the  designer  in  estimating  the  performance 
parameters  at  various  levels  of  abstraction  of  the  designs 
obtained  from  FHDL  specifications,  FHDL  provides 
means  of  defining  and  evaluating  the  system  characteris¬ 
tics  using  attributes  such  as  propagation  delay  and  the 
number  of  logic  levels.  The  paper  concludes  with  a  more 
complex  example  in  Section  3  illustrating  the  main  capa¬ 
bilities  of  FHDL. 

2.  The  Language  (FHDL),  Sequential  Constructs  and 
Symbolic  Interpretation 

The  language  FHDL  is  an  enhancement  of  the  FP 
language  defined  in  [17],  FHDL  can  be  used  to  specify 
synchronous  sequential  networks.  A  sequence  that  is 
produced  sequentially  by  a  synchronous  sequential 
machine  can  also  be  produced  spatially  by  a  combination¬ 
al  iterative  network  [12,7,5].  Thus,  a  synchronous 
sequential  system  can  be  specified  in  FHDL  using  its  spa¬ 
tial  equivalent  -  combinational  iterative  network. 

In  order  to  transform  the  FHDL  expressions  from 
the  behavioral  domain  to  its  structural  domain,  a  symbol¬ 
ic  interpreter  is  used.  In  this  interpretation  the  functions 
operate  on  objects  which  are  values  or  symbols  to  pro¬ 
duce  the  logic  diagram  or  a  net-list.  To  provide  informa¬ 
tion  on  the  performance  parameters,  the  symbolic  inter¬ 
preter  evaluates  a  number  of  attributes  associated  with 
each  primitive  function.  These  attribute*  represent  imple¬ 
mentation  characteristics  such  as  propagation  delay  and 
number  of  logic  levels. 

The  transformation  of  FHDL  expressions  from  the 
behavioral  domain  to  the  structural  domain  requires  an 
instantiation  of  the  specification  with  an  object  of  given 
dimensions.  For  example,  the  description  of  a  multi¬ 
plexer  can  be  used  for  any  size  multiplexer.  That  is,  a 
4-input  or  a  32-input  multiplexer  have  the  same  descrip¬ 
tion,  and  for  the  actual  structural  realization  the  size  of 
the  input  object  must  be  known. 

We  now  consider  the  use  of  objects,  functions  and 
functional  forms  of  FHDL  in  the  structural  domain. 

Objects  In  the  symbolic  domain,  objects  are  associated 
with  both  symbols  and  values;  functions  operate  on  ob¬ 
jects  to  generate  new  symbols,  except  in  the  case  of 
predicates.  Since  predicates  are  used  to  control  the  flow 
of  data,  they  operate  only  on  values.  Thus,  in  general 


each  atom  must  contain  a  symbol  and  a  value.  In  order 
to  obtain  design  characteristics,  the  value-,  of  various  at¬ 
tributes  are  passed  along  with  each  object  so  that  each 
function  can  update  these  values  depending  on  its  charac¬ 
teristics. 

In  general,  an  atom  in  the  symbolic  domain  has 
the  following  form: 

(symbol-or-name  value  optional-ILst-of-attribute-values) 

Currently  the  symbolic  interpreter  supports  only  two 
types  of  attributes:  the  propagation  delay(D),  and  the 
number  of  logic  levels(L).  For  example,  atom  "(MUX- 
INI  1  25  3)"  can  be  interpreted  as  a  wire  or  connection 
called  "MUXINr  with  value  of  1.  The  delay  of  the 
corresponding  signal  is  25  units  of  time,  and  the  signal 
has  passed  through  3  levels  of  logic.  It  should  be  noted 
that  a  predicate  like  atom  returns  true  token  "(DUMMY 
1  0  0)"  when  applied  to  an  object  like  "(MUXIN1  1  10 
2)*,  since  this  object  is  an  atom  in  the  structural  domain. 

Functions  In  the  structural  domain,  functions  operate  on 
symbols,  values  and  attributes.  For  example,  or  function 
applied  to  ((INI  1  5  1)(IN2  0  9  2))  will  produce  an  atom 
(WIRE. 01  1  18  3).  Three  categories  of  functions  that 
appear  in  FHDL  must  be  considered.  First,  there  are  the 
functions  that  perform  basic  boolean  operations  such  as 
and,  or,  xor,  nand,  nor,  and  not.  These  functions  are 
mapped  directly  to  the  corresponding  logic  gates.  Second 
category  contains  basic  interconnection  functions  such  as 
"select"  and  "distribute  left".  These  functions  never 
create  new  atoms  and  their  effect  is  independent  of  the 
value  of  their  input  atoms.  They  merely  rearrange  the 
atoms  within  an  FP  object,  possibly  leaving  some  out  and 
replicating  others  [18].  Third,  there  are  functions  that 
are  introduced  for  ease  of  describing  algorithms.  Exam¬ 
ples  are  length,  atom,  null,  and  predicates.  Predicates 
usually  have  dual  purpose.  They  are  used  sometimes  to 
manage  the  flow  of  control  and  ease  the  description  of 
algorithms,  while  in  other  instances  a  predicate  is 
mapped  directly  to  low-level  implementation.  In  FHDL 
symbolic  interpreter,  conditional  constructs  and  predi¬ 
cates  are  used  to  control  the  flow  of  data  and  ease  of  al¬ 
gorithm  description  solely.  For  functions  like  length 
there  exists  no  mapping  to  a  lower  level. 

The  boolean  functions  include  time  delay  and  logic 
level  attributes.  Each  time  a  boolean  operator  is  applied, 
the  corresponding  delay  and  logic  level  attributes  are  up¬ 
dated  as  follows: 

D0„  =  maxfDi.Dj)  +  D, 

=  max(L,,L-)  +  1 

where  D,  and  Dj  denote  the  time  delay  of  the  inputs,  L, 
and  Lj  represent  the  logic  level  of  the  inputs,  and  D,  is 
the  propagation  delay  of  the  operator.  The  equations  are 
interpreted  in  the  worst-case  sense:  the  time  delay  attri¬ 
bute  of  the  output  signal  is  equal  to  maximum  time  delay 
of  the  inputs  plus  D,  the  time  delay  of  the  function 
(gate).  The  logic  level  attribute  of  the  output  signal  is 
equal  to  maximum  logic  level  attributes  of  the  inputs  plus 
one.  The  following  is  the  output  of  the  symbolic  inter- 
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prefer  for  applying  the  function  HALF  ADDER  to  the  in¬ 
put  '((AO  1  0  0)(B1  1  0  0))"  which  illustratea  how  the 
boolean  functions  operate  on  attributes. 

>  1  HALF  ADDER.  841 

>  1  ((AO  1  0  0)  (Bll  0  0)) 

>  2  AND.  842 

>  2  (AO  1  0  0)(B1  1  0  0) 

>  2  (WIRE. 843  1  9  1) 

>  2  XOR.844 

>  2  (AO  1  0  0)(B1  1  0  0) 

>  2  (WIRE.845  0  16  1) 

>  1  ((WIRE.843  1  9  1)  (WIRE.845  0  16  1)) 

The  number  following  ">’  is  the  level  of  nested  function 
calls.  Each  function  occupies  three  lines.  The  first  line 
has  the  function  name  with  a  number  appended  at  the 
end,  to  make  a  unique  function  name.  The  second  line 
lists  the  inputs,  and  the  third  line  has  the  list  of  outputs. 
The  function  HALF  ADDER.  841  has  a  worst  case  propa¬ 
gation  delay  of  16  units  of  time  and  has  one  logic  level 
(as  shown  by  atom  WIRE.845).  The  gate  delay  of  and 
primitive  is  9  units  of  time  and  the  gate  delay  of  sor  gate 
is  16  units  (Mead  and  Conway  [14]  timing  model  is  con¬ 
sidered). 

Functional  Forms  We  now  discuss  the  implementation  of 
functional  forms  of  FHDL  in  the  structural  domain.  The 
structure  shown  for  each  functional  form  is  generated  by 
the  symbolic  interpreter  and  a  graphic  interpreter  [18]. 

a)  Composition  Functional  Form 

f  @  g  :  a 
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Figure  1 .  Interpretation  of  Composition 

b)  Construction  Functional  Form 

[ft.fi.fs.fJ  :  * 

t 
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Figure  2.  Interpretation  of  Construction 

c)  Constant  Functional  Form 

%c  :  z  =  =  (name  I  0  0  0) 

d)  Right-Insert  Functional  Form 

If :  z  where  z  =  (*,  x2  x,  x.  *5) 


XJ  X2  30  X4  O 


If:  I 

Figure  3.  Interpretation  of  Right- Insert 

e)  Left-Insert  Functional  Form 


\  f :  z  where  z  =  (*,  x2  x,  x4  xs) 
x>  no  u  xi 


MI 

Figure  4.  Interpretation  of  Left-Insert 

f)  Tree-Insert  Functional  Form 

|f :  z  where  z  =  (x,  x2  x,  x,  x,  x*  x,  1*) 


X1X2D  UXI  X.  XT  XI 


rx 

Figure  5.  Interpretation  of  Tree-Insert 

Apply-to-All  Functional  Forms 

Two  types  of  apply  to- all  functional  forms  are  pro¬ 
vided  for  interfacing  the  sequential  functions  to  the  com¬ 
binational  ones  and  for  specifying  algorithms  in  general. 
These  forms  are  equivalent  at  the  behavioral  level,  while 
they  differ  in  the  structural  domain.  This  distinction  is 
caused  by  modeling  the  sequential  systems  by  iterative 
networks  in  the  behavioral  domain. 

b)  Spact-Apply-to-All  Functional  Formftt) 

&f:z  ==  (fn,,f:x2 . fix*), 

where  z  is  a  spatial  sequence  of  elements. 

Space  apply-to-all  functional  form  at  the  structural  level 
will  be  mapped  to  n  copies  of  function  f.  The  function 
&f  operates  on  all  elements  of  the  input  object  con¬ 
currently  and  produces  the  results  simultaneously.  This 
form  is  equivalent  to  apply-to-all  functional  form  defined 
by  Backus  [lj. 

&f  :  z  where  z  =  (i,  x2 1,  x*  xs) 
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Figure  6.  Interpretation  of  A 
b)  Time-Apply-to-AU  Functional  Form(t) 

$f3  ==  (fsu,fa<2,...,faim), 

where  x  is  a  time  sequence  of  elements. 

At  tbe  structural  level  time-apply-to-all  is  mapped  to  one 
copy  of  function  f.  Input  x  is  a  time  object  because  the 
implementation  of  $f  implies  that  the  elements  of  the  in¬ 
put  are  applied  to  f  one  at  a  time.  In  other  words,  x^  is 
the  input  at  time  tl,  xa  is  the  input  at  time  t2,  and  so  on. 
In  the  structural  domain,  the  time-apply-to-all  functional 
form  only  operates  on  one  element  of  the  input  at  a  time 
(the  symbolic  interpreter  uses  the  first  element  of  tbe  in¬ 
put  objects).  Time-apply-to-all  consumes  an  input  gen¬ 
erated  by  a  sequential  system  and  produces  an  output  to 
be  used  by  a  sequential  system. 

Sf  :  x  where  x  =  (xy  x^  x^  xM  x^) 

XU 

I _ _ 


Figure  8.  Structural  Interpretation  of 
Sequence  Functional  Form 

I 

Tbe  following  example  of  a  bit-serial  adder  illustrates  the  I 

use  of  sequence  functional  form. 

#  Bit-serial  adder  using  the  sequence  functional  form 

dataa  seqadder  0  #  OUTPUT  ((Q,  ^...(C,-,  Vi)) 

seq|%0,*0)  #C-,-0;S-,  =0 

seqhac  full  adder  @ 

apndl  @  U@l,21 


#  INPUT  ((Ao  Bo)..  (An-1Bn.1)) 


defoa  opal  0 

Itt. 

l@lastj  @ 
seqadder 

adder 


#  OUTPUT  ((So  S|  ...  S„-,)  Cb-,) 

#  select  Sums 

#  select 


#  INPUT  ((AoBo)..  ^^.,)) 


S:I 

Figure  7.  Interpretation  of  $ 


Note  that  in  FP  programs  functions  are  applied  from 
right  to  left. 


We  now  discuss  bow  these  functional  forms  are 
used  to  describe  sequential  systems. 

Sequential  Functional  Forma 

The  abstract  model  of  a  synchronous  sequential 
system  is  a  finite  state  machine  where  x  is  the  input  and 
z  is  the  output,  while  y  is  the  present  state  and  Y  is  tbe 
next  state.  The  following  three  functional  forms  are  pro¬ 
vided  as  the  implementation  of  finite  state  machines. 
The  previous  argument  used  to  distinguish  between  the 
time  and  space  domain  implementations  applies  directly 
to  these  functional  forms.  All  of  the  following  functional 
forms  are  implemented  in  time  domain  and  applied  to 
time  objects. 

a)  Sequence  Functional  Form 

(seq  inis  seqfanc  g  seqend)3  =  = 

(g:(lnirix,xc),g:(g:(imr3,xti),x<2),...), 
where  x  is  a  time  object. 

Keywords  seq,  seqfunc,  and  seqend  act  as  delimiters. 
The  functions  inis  and  g  are  the  initialization  and  state- 
transition  functions.  Sequence  functional  form  describes 
a  finite  state  machine  where  tbe  output  vector  z  is  the 
same  as  tbe  present  state  y. 


b)  Mealy  Functional  Form 

(mealy  Out  meoot  h  menext  g  intend)  :  x  =  = 

{h:(init:x,*t),h:(g  fiiu/.x.x  ),xj), 
h:(g:(g:(/nll:x,x,),xd,t, ),...) 
where  x  is  a  time  object. 

The  keywords  mealy,  meoot,  menext,  and  meend  are 
FHDL  delimiters.  The  functions  init,  h,  and  g  are  the  ini¬ 
tialization,  output,  and  state-transition  functions,  respec¬ 
tively.  In  a  Mealy  machine,  the  output  depends  on  both 
the  present  state  and  tbe  input.  Function  inis  provides  an 
initial  value  of  tbe  state  register. 
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Figure  9.  Structural  Interpretation  of 
Mealy  Functional  Form 


In  tbe  following  example  a  bit-serial  adder  is  specified 
using  tbe  mealy  functional  form. 


0  Bit-serial  adder  using  Mealy  functional  form 


3.  Example 


mealyadder  0  4  OUTPUT  ((Q  Sq)  .(C,-,  Vi)) 

■afy  1*0, *0]  #  C-,  «  0;  S-,  =  0 

1  #  past  Carry  and  Sum 

:  full  adder  @ 
apntfl  @  (1@1,2] 


<M(Ao  Bo)(A,  B,)  ...  (V!  B,.,)) 


detea  cp«3  0 
1*2. 

I® Inst]  @ 
mealy  adder 

eoddef 


#  OUTPUT  ((So  s, ...  Vi)<Vi) 

#  select  Sums 

#  select  «Vt 

#  INPUT  ((Ao  Bo)..  (Vi  Bn-,)) 


c)  Moore  Functional  Form 


We  now  illustrate  the  use  of  apply-to-all  and 
sequential  functional  forms  by  considering  the  specifica¬ 
tion  and  implementation  of  a  multi-operand  carry-save 
adder.'  Multi-operand  carry-save  adder  uses  a  carry-save 
logic  with  feedback  in  order  to  perform  additions.  The 
final  partial-sum  and  the  carries  are  passed  to  a  carry - 
propagate  adder  to  generate  the  final  result.  A  high- 
level  logic  schematic  of  the  adder  is  shown  in  Figure  11. 
The  following  is  an  FHDL  description  of  the  multi¬ 
operand  carry-save  adder. 

detea  fulladder  0  #  (C,„,  S,) 

[(1  or  2),3)  @ 

apncfl  @  ll,halfadder(§|2,31]  @ 
apndr  @  (half adder (S(1 .21,3) 
eaddef  #  (A  B,  C.) 


(moor*  init  moout  h  monext  g  moend)  a  =  = 

(h:(imt:x),h:(g:(init:x,t^)), 

h:(g:(g:(inU.X,xa),xa)),...) 

where  x  is  a  time  object. 

The  keywords  moore,  moout,  monext,  and  moend  are 
FHDL  delimiters.  The  functions  init,  h,  and  g  are  the  in¬ 
itialization,  the  output  and  the  state-transition  functions, 
respectively.  In  Moore  machine  the  output  depends  on 
the  present  state  only. 

*u  » 


STATE- EEC 


ovmir  7 

Figure  10.  Structural  Interpretation  of 
Moore  Functional  Form 

The  bit-serial  adder  is  now  specified  using  the 
moore  functional  form. 

0  Bit-serial  adder  using  Moore  functional  form 


detea  halfadder  0  #  (C,.,  S,) 

(and, lor) 

enddef  #  (A,  R) 

detea  initaa  Q  #  initial  value  of  state 
[(ft  [%0,*OD,%0)@  1 

eaddef 

defaa  addall  0  #  «Xo  V-i  0)...(X,.,  Vi.-,  R.-1.-1)) 

ftapndr  @  trans  @  (l@tlr@l,2J 

eaddef  #(((Sw-,0)...(Vi«-i  Vn-.)C*-,) 

#  Vt)) 

detea  rearrangeoutput  0 

#((o  w(c0s1)...(cn.,Vi)) 

pair  @  spnd}  @  f  %0,id]  @  ooncat 
@  ft  reverse 

•adder  #((Q,SW(C,S,)...(Cb.!  V,)) 

detea  csaO  #  «0  So»)...(  Vi*  Vs,)  Q*) 

(llr,l@last)  @ 
rearrangeoutput  @ 
ft  full  adder 
@  addall 

eaddaf  *  (((0  SBt-.)...(RB-u-i  Vi.-t)  Cm,-,) 

4  (Xo  -  Vi)) 


dctea  mooreadder  0  #  OUTPUT  ((Q,  So)... (C„-,  Vi)) 

Bsaor.  (%0,*0)  #C_,  -0;S-,  -0 

■aoootid  4  past  Carry  and  Sum 

mooest  fulladder  @ 
apndl  @ 
ll@1.2J 


#((Ao  Bo)(A,  B,)...(V.  Vi)) 


dctea  qpa2  0 
1*2, 

1(5  lest)  @ 
mooreadder 


4  OUTPUT  ((So  S,  ...  Vi)V>) 

4  select  Suras 
#  select  q,., 


4  INPUT  ((Ao  Bo)..  (Vi  V-,)) 


dctea  multi opcsa  0  4  (((0  VMVui  Sn-id))  On,) 

4  ... 

^  ((0  ^T>cm)...(lL-  ‘im  Sn-lso))  Crw,)) 
eeq  init csa  eeqteac  csa  erqead 
eaddW  #  ((An,,-  An-,„)...(A0m..  V-,tJ) 

detea  multiopadder  ()  4  ((Pn-,„.  .Po,,  C*,)... 

^  (Pn-Ivn-’-Bton)) 

(aprvdr  @  (cpa@l,2))  (5:  last 
@S([apndl@[U*0,%0D,l),2]) 

@  raulbopcsa 

«xid«f  #((V-  Vi..)  (V  V:J) 

The  function  multiopcsa  specifies  the  carry-save  adder. 
The  function  multiopadder  connects  the  carry-save  adder 
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Figure  11.  A  Multi-operand  Cany-save  Adder 

(the  sequential  system)  to  the  carry  propagation  adder 
(the  combinational  circuit).  The  important  aspect  of  this 
example  is  the  use  of  the  sequential  functional  form  in 
csa  function  and  the  use  of  time-apply-to-all  functional 
form  in  function  multiopadder  to  interface  between  the 
sequential  part  and  the  combinational  one. 

Figure  12,  13,  and  14  show  the  logic  diagram  of 
functions  multiopadder,  multiopcsa,  and  csa  respectively. 


At  A1  A1  AJ 


»  n  pi  n  cm 

Figure  12.  Logic  Diagram  of  Multi-Operand-Adder 
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Figure  13-  Logic  Diagram  of  Sequential  CSA 
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Figure  14.  Logic  Diagram  of  a  CSA 

For  carry  propagate  adder(CPA),  function  cpa,  any  of 
the  specification  in  Section  S.2  can  be  used.  A  more  de¬ 
tailed  discussion  is  given  in  [15].  As  mentioned  earlier, 
two  different  tools  are  used  in  this  work.  First,  the  func¬ 
tional  interpreter  provides  a  functional  simulation  of  the 


FHDL  specification.  Second,  the  symbolic  interpreter  is 
used  to  transform  the  specification  from  the  behavioral 
domain  to  the  structural  domain.  The  following  is  a 
sample  of  the  functional  simulation  of  multiopadder  and 
multiopcsa  functions. 

multiopcsa:((0  0  0  0)(0  0  0  1)(0  1  1  1)(0  0  1  0)) 

((((0  0)  (0  0)  (0  0)  (0  0))  0)  (((0  0)  (0  0)  (0  0)  (0  1))  0) 

(((0  0)  (0  1)  (1  1)  (0  0))  0)  C(0  0)  (1  1)  (0  1)  (0  0))  0)) 

multiopadder  :((0  0  0  0)(0  0  0  1)(0  1  1  1)(0  0  1  0)) 

(1010  0) 

The  simulation  of  multiopadder  in  the  structural  domain 
using  the  symbolic  interpreter  is  given  below.  Only  the 
high-level  modules  are  shown  below. 

>  1  MULTIOPADDER.  408 

>  1  (UIN11  1  0  0)(EN12  1  0  0)(IN13  1  0  0)(IN14  1  0  0))((IN21  1  0  0) 

(IN22  1  00)(IN23  1  0  0)<IN24  10  0))) 

>  2  MULTIOPCSA  409 

>  2  (((mm  100)<IN12  100)(IN13  100KIN14  100))((IN21  100) 

CEN22  1  0  0)(IN23  1  0  D)(IN24  1  0  0))) 

>  3CSA419 

>  3  ((({(CONST  410  0  0  OHCONST  411  0  0  0))((CONST  412  0  00) 

(CONST  413  0  0  0))((OONST  414  0  0  0)(CDNST  415  0  0  0)) 

((CONST  416  0  0  OHCONST  417  0  0  0)))(CONST  418  0  0  0)) 

((IN11  1  0  0XIN12  1  0  0)(IN13  1  0  0)(1N14  1  0  0))) 

>  3((((CCNST474  000>(WIR£  »31  1  32  2»((W1RE  *33  0  34  3) 

(WIRE  444  1  32  2))((WTRE  446  0  34  3)(WIR£  457  1  32  2)) 

((WIRE  459  0  34  3)fWIRE  470  1  32  2)))(WIRE  472  0  34  3)) 

>  3  STATE  473 

>  3  (({(CONST  474  0  0  0)(WTRE  431  1  32  2))((W7RE  433  0  34  3) 

(WIRE. 444  1  32  2))((WTRE  446  0  34  3)(WIRE  457  1  32  2)) 

((WIRE  459  0  34  3XWIRE  470  1  32  2)))(WIRE  472  0  34  3)) 

>  3  (({(CONST  410  0  0  0)(CCNST  411  1  0  ODKCONST  412  0  0  0) 

(CONST  413  1  0  0))((CONST  414  0  0  0)(CCNST  415  10  0)) 

((CONST  416  0  0  OHCONST  417  1  0  0)))(CONST  418  0  0  0)) 

>  2  ((((CONST  410  0  0  OHCONST  41110  0))((CONST  412  0  0  0) 

(CONST  413  10  0))((CONST  414  0  0  OHCONST  415  10  0)) 

((CONST  416  0  0  OHCONST  417  10  0)))(CCNST  418  0  0  0)) 

>  2  CPA47B 

>  2  ((((CONST  476  0  0  OHCONST  4T7  0  0  0)))((CCNST  410  0  0  0) 

(CONST  41110  0))((CONST  412  0  0  OHCONST  413  10  0)) 

((CONST  414  0  0  OHCONST  415  1  0  0) (((CONST  416  0  0  0) 

(CONST  417  1  0  0))) 

>  2  ((WIRE  490  1  32  2)(WIR£  504  1  50  4)(W1RE  518  1  68  6) 

(WIRE. 532  1  86  8)) 

>  1  ((WIRE  490  1  32  2)(W1RE  504  1  50  4)(WTRE  518  1  68  6) 

(WIRE  532  1  86  8)(CCNST  418  0  0  0)) 

The  symbolic  interpreter  does  not  provide  the  full  simu 
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lation  of  the  FHDL  specification.  The  interpreter  simu¬ 
lates  the  specification  only  to  the  level  necessary  to 
transform  the  specification  to  the  structural  domain.  As 
illustrated  above,  the  function  multiopadder  was  applied 
to  the  input  consisting  of  two  numbers  represented  with 
bit- vectors  1111  and  1111.  The  output  of  symbolic  inter¬ 
preter  was  1111,  because  the  interpreter  executes  the 
multiopcsa  function  only  once.  That  is,  it  adds  the  first 
input  with  the  initial  value  of  state  register  w.  The  inital 
value  of  state  register  is  provided  by  function  initcsa 
which  is  zero. 

The  maximum  dock  rate  of  multiopadder  is  34  un¬ 
its  of  time  plus  the  time  delay  of  the  state  register  (i.e., 
8  units  of  time).  The  delay  of  cpa  function  is  86  units  of 
time.  Thus,  for  adding  20  numbers  about 
20*(34  +  8)  +  86  =  926  units  of  time  are  required. 

As  illustrated  above,  FHDL  can  be  used  to  specify 
digital  systems,  to  map  the  specification  into  a  gate  level 
implementation,  and  to  simulate  its  functional  behavior. 
The  use  of  attributes  provides  a  systematic  method  for 
gathering  performance  characteristics  of  the  design. 

4.  Condusion 

A  functional  programming  hardware  description 
language  (FHDL),  based  on  the  Backus's  FP,  was 
described.  FHDL  supports  the  specification  of  both  com¬ 
binational  and  sequential  systems.  The  sequential  sys¬ 
tems  are  modeled  by  equivalent  iterative  networks  at  the 
behavioral  level.  Then,  a  symbolic  interpreter  is  used  to 
interpret  the  specifications  automatically  at  the  structural 
level,  which  presently  consists  of  gates,  registers  and  in¬ 
terconnections.  Characteristic  attributes  are  introduced 
in  a  general  manner  so  that  the  information  about  system 
performance  and  design  parameters  can  be  extracted. 
The  two  implemented  attributes  arc  the  delay  and  the 
number  of  levels. 
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A  generalization  of  low- cost  residue  codes  into  two- 
dimensional  encodings  was  presented  and  error  detect¬ 
ing  and  error  correcting  properties  of  two  dimensional 
inverse  residue  codes  were  discussed  previously.  This 
paper  presents  byte-serial  checking,  additive  inverse 
(complementation),  and  addition  algorithms  for 
operands  encoded  in  two-dimensional  residue  and  in¬ 
verse  residue  codes. 


I.  Introduction 


A  general  approach  to  the  cost  and  effectiveness  study 
of  low-cost  arithmetic  error  codes  has  been  presented  in 
(A VIZ  71  a|.  This  paper  introduced  the  concepts  of  in¬ 
verse  residue  codes  and  of  multiple  arithmetic  error 
codes.  The  concept  of  repeated  use  faults  was  presented 
and  the  effectiveness  of  various  arithmetic  codes  with 
respect  to  both  determinate  and  indeterminate  repeated- 
use  faults  was  established.  An  important  result  was  the 
proof  that  inverse  residue  codes  can  detect  the  “com¬ 
pensating"  determinate  repeated-use  faults  that  are  not 
detected  by  ordinary  residue  codes.  The  modulo  15  in¬ 
verse  residue  code  was  applied  in  the  JPL-STAR  experi¬ 
mental  computer  (A VIZ  71b],  Further  results  on  deter¬ 
minate  faults  were  presented  in  (PARH  73|  and  (PARH 
78|  An  extension  to  signed  digit  arithmetic  is  found  in 
(AVIZ  81 1-  J.  F.  Wakerly  has  analyzed  the  detectabili¬ 
ty  of  unidirectional  multiple  errors  |WAKE  75|,  and 
A  M.  Usas  has  demonstrated  the  advantages  of  inverse 
residue  codes  for  multiple  unidirectional  caor  detection, 
when  compared  to  inverse  checksum  codes  (USAS  78|. 
Bose  and  Rao  have  considered  unidirectional  one-line 
error  correcting  codes  using  a  combination  of  byte  pari¬ 
ty  and  residue  (not  low-cost)  encoding  (BOSE  80|. 


A  new  generalization  presented  in  (AVIZ  83(  extended 
the  application  of  low-cost  inverse  residue  codes  into 
two  dimensions:  row  (byte)  and  column  (line)  residues. 


*  This  research  hat  been  supported  by  ONR  contract  NDOOlt-fU- 
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This  extension  improves  the  detection  of  errors,  espe¬ 
cially  of  those  due  to  indeterminate  faults,  and  provides 
certain  error-correction  capabilities.  Of  special  interest 
to  current  VLSI  implementations  of  arithmetic  are  the 
advantages  offered  by  two-dimensional  inverse  residue 
codes  in  the  detection  and  oorrection  of  errors  that  af¬ 
fect  byte-wide  communication  paths  and  processing  ele¬ 
ments.  Such  paths  are  widely  used  in  high-performance 
array  processors,  systolic  arrays,  and  for  inter-processor 
communication  in  large  multi  processor  systems.  Byte¬ 
wide  processing  elements  are  very  suitable  for  the  im¬ 
plementation  of  large  processing  arrays  [AVIZ  70|, 
[TUNG  70|  and  variable- precision  signed-digit  arithmet¬ 
ic  (AVIZ  62|. 


This  paper  presents  the  fundamental  byte-serial  arith¬ 
metic  algorithms  for  operands  encoded  in  two- 
dimensional  low-cost  inverse  residue  codes.  The  algo¬ 
rithms  are: 


(a)  the  line-residue  checking  algorithm; 


(b)  the  additive  inverse  (complementation)  algo¬ 
rithm; 


(c)  the  addition  algorithm. 


A  brief  review  of  the  error-detecting  and  error- 
correcting  properties  of  2-D  inverse  residue  codes  fol¬ 
lows  the  description  of  arithmetic  algorithms. 


2.  Model  of  the  Byte-Serial  Communication  and 
Computation  Path 


We  consider  a  communication  and  computation  path  con¬ 
sisting  of  b  bit  lines  .  .  .  ,X"~').  The  binary 

operand  X  consists  of  kb  bits,  processed  as  k  bytes 

(Xn . X, . Xt_,)  of  b  bits  length  each.  Figure 

1  shows  the  notation  used  in  this  paper. 


Two  types  of  low-cost  residue  encoding  are  applicable  to 
the  operand  X: 


(a)  Residue  Code :  the  k  bytes  carry  an  error- 


detecting  code  check  tyte  Xk  that  represents  the 
modulo  2*-l  residue  X'  of  the  operand  X: 
X'  =  (2*-1)(X;  and  the  operand  is  now  l  +  l  bytes 
long.  Usually  the  residue  value  K'=0  is 
represented  by  a  string  of  b  ones.  If  the  all-zero 
operand  can  exist,  its  residue  will  be  b  zeros,  un¬ 
less  explicitly  disallowed. 

(b)  Invent  Residue  Code',  the  inverse  residue  byte  Xk 

represents  the  value  X"  that  is  the  (2k-l)'s  com¬ 
plement  of  X‘.  It  is  obtained  as 

X*=  (2‘-l)-X*  =  (2*-l)-(2*-l)lX;  and  the 
operand  is  again  1  +  1  bytes  tong.  The  residue 
value  X'=0  is  represented  by  b  ones,  and  the  in¬ 
verse  residue  X*  in  this  case  is  represented  by  b 
zeros.  The  all-zero  operand  X  has  an  inverse 
residue  code  X*  represented  by  b  ones. 

To  form  a  two-dimensional  residue  encoding,  one  more 
check  line  X*  is  added  to  the  communication  and  com¬ 
putation  path  (Figure  1).  The  lines  are 

summed  modulo  2*+1-l  to  get  the  Une-resldue  of  X. 
Two  classes  of  2-D  low-cost  residue  codes  can  be  em¬ 
ployed: 

(c)  Two-dimensional  Residue  Code:  the  check  bits  Xf 
of  the  check  line  X*  represent  the  modulo 
2t4"'  - 1  line-residue  Xt  : 

XL  =  (2*+1-l)  |  EX'  ;  where  X'  =  £x/2( 

)-o  i- o 

(d)  Two-dimensional  Inverse  Residue  Code:  the 

check  bits  xf  on  the  check  line  X*  represent  the 
modulo  2*+l-  1  Inverse  line-residue  XL  : 

XL  =  (2‘  +  ,-l)  -Xt 

It  is  important  to  note  that  the  bits  (X*  ,X*)  of  the 

check  byte  Xk  are  treated  as  the  most  significant  bits  of 

the  lines  X6'1 . Jf°  when  the  line-residue  of  X  is 

determined.  The  line-residue  encoding  is  superimposed 
on  the  already  encoded  operand. 


Check  line  line 

line**  X*-1  X' 


line  line  byie 

X'  X*  lymbol 


xf  X|-‘  x(  •  •  *1  xt  X, 


xt  xf-‘  -  -  Xl  ■■■  XI  xt  i 


Xf-,  xf:|  •  xi;  ■  xi-,  Xt-,  X,., 

xt  Xf~*  •  ■  *  xl  ■  ■  ■  xl  Xf  check 

X, 


Figure  I.  Model  of  the  Path  and  the  Operand  X 


3.  A  Dyte-Sertal  Line-Residue  Checking  Algorithm 

Given  a  Two-Dimensional  Inverse  Residue  encoded 
operand  X  (as  shown  in  Figure  1),  the  line-residue 
checking  algorithm  requires  the  computing  of  the  line 
check  result  R(L): 

R(L)  =»  (2*+,-l)  |  £x';  where  X>  =*  2X/2* 
j-o  1-0 

An  "all-era  es"  R(L)  indicates  a  valid  encoding;  all  other 
values  of  R(L)  indicate  an  error. 

In  the  byte-serial  implementation,  one  byte  of  X  be¬ 
comes  available  at  a  time,  with  the  least  significant  byte 

X0  arriving  first.  The  "line  sum"  ^X1  designated  by 

J-o 

2,(L),  is  computed  as  the  weighted  sum  of  the  "count  of 
ones"  in  each  byte: 

2(t)-  £x'«  £(£*/)  * 

j-o  i-o  j-o 

In  order  to  get  R(L),  the  modulo  (24+1-l)  residue  of 
J](L)  must  be  computed.  That  requires  an  "end- 
around-carry"  addition  of  the  'overflow  bits" 

(S4+m, . S*+|)  of  5)(L)  that  are  in  the  positions 

1  +  1 1  +  m  of  £(L  )•  Sim*  a  full-length  addition  de¬ 
lay  is  inacceptable  in  high-speed  byte-organized  comput¬ 
ing  (e.g.,  systolic  arrays),  a  fast  line-residue  checking 

algorithm  is  developed  here  that  requires  only  a  short, 
m  bit  (with  2"  a:  b  +  1)  addition  after  £(L)  has  been 
byte-serially  computed. 

The  speed-up  is  accomplished  by  the  simultaneous  com¬ 
puting  of  two  tentative  line  sums  J)(L)  and 
2(2-)'  =  2(*-)  +  2".  The  value  of  m  it  determined  by 
the  maximum  value  of  the  line  sum  £(L).  An  upper 
bound  for  £(L)  is  obtained  by  assuming  all  digits 
Xj  (Ostsl;  Osjsb)  to  have  the  value  "1".  (This  situa¬ 
tion  cannot  occur  for  a  valid  encoding,  but  could  be 
caused  by  an  error.)  In  this  case, 

=  (*+l)(2*+,-l)  =  M*+,  +  (2*+,-l)~* 

and  the  "overflow  bits’  represent  the  value  b  with 
respect  to  the  position  1  +  1  of  the  line  sum  J](L).  The 
value  of  m  is  the  smallest  integer  that  satisfies  the  con¬ 
dition: 

2"-lah  or  T  ah+1 

For  example,  8-bit  bytes  (b=8)  will  need  m  =  4,  regard¬ 
less  of  operand  length  1  (in  bytes). 

After  5)(L)  and  2( L )'  =  ]£(I.)  +  2"  have  been  comput¬ 
ed,  the  m  "overflow  bits"  (5t+„,... ^»+i)  of  5)(f.)  are 

added  to  the  m  least  significant  bits  (5n.t . 5q)  of 

2(L).  The  resulting  carry-out  Cm  determines  the 
choice  of  the  bits  ( Sk,...,Sm ): 
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(a)  If  C„=0  ,  (St,...,Sm)  come  from  £(L) 

(b)  If  C„  =  l  ,  ( Sk,...Jm )  come  from  2(*-)' 

The  "all  ones"  line  check  result  (5,=  1  ;Osfst)  indi¬ 
cates  the  absence  of  errors;  any  other  line  check  result  is 
an  error  indication.  The  determination  whether 
(Sk,...,Sm)  are  "all  ones’  is  done  while  these  bits  are 
computed.  The  final  step  is  to  test  whether  the  bits 

(S„_  . . 5a)  also  are  "all  ones"  after  the  "end- around" 

addition  of  the  "overflow  bits". 

4.  The  Byte-Serial  Additive  Inverse  Algorithm 

The  additive  inverse  of  an  operand  X  is  formed  by  ob¬ 
taining  the  complement  of  X.  Either  "one’s"  or  "two’s" 
complements  can  be  employed;  the  specifics  are  dis¬ 
cussed  in  (A VIZ  71a|  and  (A VIZ  73{. 

The  purpose  of  this  section  is  to  develop  the  correspond¬ 
ing  complementation  algorithm  for  the  inverse  line- 
residue  Xb.  When  the  "one’s"  complement  1C.  of  X  is 
formed,  the  "count  of  ones"  in  each  byte  Xj  of  X  is  : 

4-1-  4-1  . 

2;x/  =  i>-  2*/ 

l-o  j-a 

This  leads  to  the  relationship  for  2(X): 


2(X)  =  i  (h  -  2  X{)  2‘  =  b(2*  +  1-l)  -  2(*) 
i-o  7-0 

Taking  the  line-residue  modulo  (2*+1-l),  we  get: 
(2‘+1-l)|2(X)  = 

=  (2*  +  1-l)U0-[(2*+1-l)|2WI)= 

=  (2*+,-l)  -  (2*  +  1-l)  |  2(X) 

The  relationship  demonstrates  that  the  line-residue  of 
the  "one’s*  complement  X  is  obtained  by  taking  the 
one's  complement  (2*+,-l)-((2*+1-l)  |  2(X)j  of  the 
line-residue  (2*+1-l)  |  2(*)  that  was  computed  for  the 
operand  X.  The  same  argument  follows  for  the  inverse 
line-residue. 

If  the  "two’s"  complement  X*  of  X  is  to  be  formed,  it  is 
considered  to  be: 

X*  =  X  +  2° 

In  order  to  get  the  tine-residue  of  X*,  the  line- residue  of 
2°  must  be  added  modulo  2*+1  - 1  to  the  line-residue  of 
X.  For  inverse  line-residue  encoding,  this  means  the 
addition  of  c|}=  1,  as  described  in  the  next  section. 


5.  The  Byte-Serial  Addition  Algorithm 

We  cohsider  the  byte-serial  addition  of  two  operands  X 
and  Y,  each  kb  bits  Iona,  to  get  the  sum  Z  =  X+  X.  The 
addition  is  modulo  2“-l  ("one’s*  complement),  or 
modulo  2**  ("two’s"  complement). 

The  check  byte  Z4  is  obtained  by  adding  the  check  bytes 
Xk  and  X4  modulo  2*— 1.  If  "two’i"  complement  is 
used,  a  "correction  signal"  input  needs  to  be  used  as 
defined  in  IAVIZ  73). 

An  algorithm  to  generate  the  Inverse  line-residue  for  Z 
from  the  inverse  line-residues  of  X  and  Y  is  developed 
here.  As  first  developed  by  Garner  [GARN  58).  the 
carries  generated  during  the  addition  of  X  and  Y  need 
to  be  employed  in  the  calculation  of  the  inverse  line- 
residue  for  Z. 

The  "count  of  ones*  (designated  by  a(Z,))  in  each  byte 
Z,  (Osrisk-l)  ofZ  is: 

*2  2/  =  *2  (xbY{)  ~  (2  ct)-2Cf  +  C,° ; 

7-0  7-0  7-1 

or:  a(Z,)  =  a(Xi)  +  a(y,)-a(imrnui7  C,)-2C,k  +  C,°  ; 

where  C{  is  the  carry  buo  the  j  -th  position  of  the  sum 
byte  Z„  and  C,°+1  =  Cf  for  Orsi^k-2.  For  "one’s" 

complement  addition  of  X  and  X,  we  also  have 

ci-,  =  eg. 

The  above  leads  to  an  expression  for  2(2)  when 

2(2)  =  2  «(Z,)2'  +  a(Zt)2*  ; 

1-0 

or 

2(2)=  2*W+2*(n-2*(<«'^0-2*c4*.1-i- 
+  Cg  +  2*[a(X4)+a(y4)-a(C4)) 

The  count  a(C4)  is  the  total  count  of  carries  C{  for 
lsysh,  since  C4  =  C4  in  the  modulo  2k-l  addition  of 
the  check  bytes  X4  and  X4  ;  i.e.,: 

«(c4)  =2  Cl-  a(lnl  C4)  +  C4 
I-' 

Two  cases  need  to  be  discussed  separately: 

(a)  "One’s”  complement  (modulo  2**-l)  addition  of 
X  and  Y  ; 

(b)  "Two’s"  complement  (modulo  2U)  addition  of  X 
and  X. 

For  "one’s"  complement,  C4_,=cg  is  the  "end-around- 
carry",  and  the  expression  for  2(2)  *»  : 
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+  2ka(Xk)  +  2*a(»s)  -  2 ka(ini  Ck)  -  2*Cf 
The  expression  reduces  to: 

5;(z)=2Cx)+2Ck)-I  20*  0+2k(c{+cf-,)-c{_,i 

Taking  the  line-residue  modulo  A  =  2*+1-l  of  2(Z).  we 
get: 

+A|  %(Y)  ~A|  [  2(w-  C)+2*(Cf+Cf-,)-Cf-,l} 

This  relationship  shows  that  the  line-residue  of  Z  can  be 
predicted  from  the  line-residues  of  X  and  Y,  as  well  as  a 
line-residue  computed  from  the  internal  carries  formed 
during  the  summation  of  X  and  Y.  The  two  "end- 
around’  carries  and  Ck  also  need  to  be  included  in 
the  calculation. 

Common-mode  errors  can  occur  if  the  carries  are  in¬ 
correctly  determined.  To  avoid  such  errors,  separate 
and  independent  carry-forming  circuits  need  to  be  em¬ 
ployed  to  form  the  carries  for  line-residue  determina¬ 
tion. 

For  two’s  complement,  the  "correction  signal'  Cj_j 
must  be  added  (modulo  2*-l  to  the  modulo  2*-l  sum 
of  inverse  residue  check  bytes  Xk  and  Yt  (A VIZ  73(. 

The  expression  for  £(Z)  is  now: 

=  C)-2kCk-i  +Cg  + 

+  2*a(X1)  +  2ka(Yt)  -  2*a(inl  C*)  -  2*Cf  +  2*Cf - , 

The  expression  is  reduced  to: 

2(z)=2(*)+2(n-(  20*  c)+2‘c*‘-cSi 

where  Cq  =  1  only  exists  if  one  of  the  two  operands  is 
being  complemented  (in  "two's"  complement)  simultane¬ 
ously  with  the  addition.  Once  again,  we  take  the  line- 
residue  modulo  A  =  2* + '  - 1  of  ^(Z)  as  follows: 

A  1 2(Z)  = 

=A  I  (A  |  £(x)+a|  2(»')-/'  I  ( 2 0*  c)  +  2*cf-cj|n 

The  difference  between  "two’s"  and  "one’s"  complement 
cases  is  quite  small  with  respect  to  computing  the  line- 
residue  (2*+1-l)|£(Z). 

In  practical  implementation  of  byte-serial  arithmetic  the 
"two’s"  complement  addition  (and  subtraction)  is  strong¬ 
ly  preferable  because  there  is  no  "end-around-carry" 
that  requires  either  a  second  addition  or  the  generation 
of  two  "tentative"  sums  •  with  and  without  the  end- 
around-carry. 


i.  Detection  of  (JoldlrecUooal  and  Bidirectional  Errors 

In  this  and  the  following  sections  7  and  8,  the  etTor- 
detecting  and  error-correcting  properties  of  the  two- 
dimensional  codes  that  were  first  presented  in  (AVTZ 
83|  are  reviewed,  illustrated,  and  extended  to  two  and 
three  adjacent  lines. 

Given  a  modulo  2*-l  inverse  residue  code,  the  un¬ 
detectable  unidirectional  errors  are  those  that  have  error 
values  E  congruent  to  zero  modulo  2*- 1,  where 


All  other  unidirectional  errors  will  be  detected;  howev¬ 
er,  there  are  no  error  correction  properties. 

One  bit-line  determinate  ("stuck  line”)  faults  that  cause 
unidirectional  errors  will  always  be  detected  as  long  as 
the  condition: 

(k+ 1)  <  (2k-  1) 

is  satisfied.  For  two  adjacent  "stuck  lines,"  the  condi¬ 
tion  is: 

3(4  +  1)  <  (2fc- 1) 

For  m  adjacent  "stuck  lines,”  the  condition  is: 


(Z"-I)(4  +  l)  <  (2*- 1) 

For  the  purpose  of  this  discussion,  lines  0  and  b-  1  are 
considered  adjacent. 

The  PM  (pattern  miss)  (AVTZ  81]  percentages  for 
"stuck  line”  faults  remain  very  low  after  the  left  side  of 
the  inequalities  above  exceeds  the  limit  that  guarantees 
PM  percentage  of  0%.  For  example,  for  one  “stuck 
line,”  when  4+l  =  2*-l  is  reached,  we  have: 

PM( Inv.  Residue)  =  1(XV(2*+’) 

since  only  one  of  the  2*+I  possible  error  patterns  on  the 
"stuck  line”  (all  zeros  -  all  ones,  or  vice  versa)  goes  un¬ 
detected.  The  situation  is  not  as  favorable  with  "stuck 
byte"  faults,  as  discussed  next. 

There  is  one  undetectable  one-byte  unidirectional  error; 
it  results  when  an  all-zero  byte  X,  is  changed  to  an  aJI- 
ones  byte,  or  vice  versa.  The  PM  percentage  for  this 
"stuck  byte”  fault  is  (10C/2h)%.  Introduction  of  byte 
parity  bits  will  detect  only  one  of  the  two  (stuck -on-one 
and  stuck-on-zero)  "stuck  bytes”;  the  other  one  remains 
undetectable. 

The  “stuck  byte"  detection  problem  is  fully  solved  by 
the  use  of  two-dimensional  inverse  residue  encoding. 
There  is  one  additional  check  bit  X*  for  each  byte  X, 
(i  =  0 . 4).  The  check  bits  (Xf . X§)  represent 
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the  modulo  2**'-l  inverse  line-residue  1*  of  the 
operand  X  that  is  now  interpreted  as  b  lines  X 1 

(J  =  0 . b- 1)  of  *+ 1  bits  length  each.  It  is  evident 

that  every  “stuck  byte"  now  will  be  detected  by  the  use 
of  1”  as  long  as  the  condition: 

(b+1)  <  (2*  + 1  —  1) 

is  satisfied .  For  two  adjacent  “stuck  bytes,”  the  condi¬ 
tion  is: 

3(b+ 1)  <  (2* ■*" 1  —  1 ) ; 
for  p  adjacent  “stuck  bytes”  it  is: 

(»*-l)(6+l)  <  (2*+l-l) 

The  bytes  X0  and  Xt  are  considered  adjacent  in  this 
analysis. 

The  two-dimensional  inverse  residue  is  clearly  superior 
to  the  byte-parity  encoding,  since  the  "stuck  byte"  con¬ 
dition  subsumes  all  other  possible  error  patterns  (dou¬ 
ble,  quadruple,  etc.)  in  the  byte,  while  ail  "even  error’ 
patterns  go  undetected  when  byte  parity  is  the  only 
form  of  encoding. 

In  general,  the  remaining  undetectable  errors  in  the 
operand  X  are  those  that  are  missed  by  both  checks: 
modulo  2*- 1  over  the  bytes  fnot  including  the  check 
line  bits  Xf),  and  modulo  2*+  -1  over  the  lines,  with 
the  check  byte  bits  X(  included  in  each  line  J.  Most  uni¬ 
directional  errors  are  detectable;  furthermore,  the  detec¬ 
tion  of  bidirectional  errors  is  significantly  improved,  as 
discussed  below. 

It  has  been  noted  that  low-cost  inverse  residue  codes  are 
considerably  less  effective  in  detecting  bidirectional  er¬ 
rors  due  to  indeterminate  repeated-use  faults  [AVIZ 
71a).  The  addition  of  the  line-residue  (i.e.,  the  second 
dimension  of  encoding)  allows  the  detection  of  all  bi¬ 
directional  errors  that  affect  a  single  line,  as  well  as  all 
bidirectional  double  errors  affecting  any  two  bits  of  the 
operand  X.  The  double,  quadruple,  and  other  even 
"balf-and-balf"  bidirectional  errors  on  one  line  that  were 
undetected  by  the  byte  check  are  now  detected  by  the 
line  check,  while  those  in  one  byte  are  detected  by  the 
byte  check. 

The  remaining  undetectable  bidirectional  errors  are 
those  that  are  simultaneously  undetectable  by  the  byte 
check  and  the  line  check.  An  illustration  is  the  quadru¬ 
ple  error  that  changes  Z  to  Z*  as  shown  below: 

2  -  01  ->z*  -  1  0 

2  1 0  2  0  1 


Here  an  even  number  of  opposite-direction  changes  oc¬ 
curs  simultaneously  in  the  bytes  and  lines  of  the 
operand  X.  In  general,  all  quadruple  errors  of  this  type 
(at  four  corners  of  a  rectangle  of  bits  within  the  operand 
X)  are  undetectable. 


7.  Correction  of  Single-Bit  and  Unidirectional 
Single-Line  Errors 

The  introduction  of  the  inverse  line- residue  Y“  also 
makes  single-bit  error  correction  possible.  As  shown  in 
[AVIZ  71a|,  the  low-cost  inverse  residue  codes  have  the 
“partial  error  location”  property.  Therefore  a  single-bit 
error  value  E{=±  1  (Os/sb-1;  Os/s*)  will  produce  a 
unique  indication  for  line  j  in  the  modulo  2*-1  check 
and  for  the  byte  i  in  the  modulo  2*+ 1  - 1  check,  making 
a  correction  of  E{  possible  in  the  operand  X.  The 
single-bit  error  £,*=±1  that  occurs  in  the  check  line  b 
will  produce  the  indication  for  byte  I  (0 sis*)  in  the 
modulo  2*+1-l  check,  but  no  error  indication  at  all  in 
the  modulo  2*-l  check,  since  it  does  not  include  the 
bits  of  the  check  tine.  Correction  of  E *  is  therefore  pos¬ 
sible. 

The  correction  property  can  be  extended  to  most  uni¬ 
directional  single-line  errors  as  follows.  If  we  assume  a 
determinate  single-line  fault  on  line  j,  the  error  values 
E(J)  will  fall  into  the  range: 

~2^E{rsE(j)^TJ^E{ 

i-0  1-0 

The  positive  values  will  be  due  to  a  stuck-on-one  (s-o-l) 
and  negative  values  —  due  to  a  stuck-on-rero  (s-o-0). 
The  actual  byte  check  results  will  assume  the  values 

C(J)-(2h- 1)  |  E(J),  and  as  long  as  (*  +  IK(2fc- 1) 
holds,  all  error  values  due  to  a  s-o-l  fault  will  be  detect¬ 
able  and  have  a  unique  byte  check  result  C(J)  in  the 
range 

Os  C,0)  s  (2*-l)  |(*  +  1)2' 

Similarly,  the  error  values  due  to  a  s-o-0  fault  will  have 
the  byte  check  result  in  the  range: 

OsC00)s(2fc-l)|(-2')(*  +  l) 

However,  many  other  error  patterns  (on  two  or  more 
lines)  can  produce  the  same  values  of  check  results,  and 
error  correction  is  not  possible  with  the  byte  residue  en¬ 
coding  alone. 

To  obtain  single-line  unidirectional  error  correction,  we 
use  the  additional  information  provided  by  the  line 
check  result  obtained  from  the  inverse  line-residue  en¬ 
coding.  Given  a  byte  check  result  Ct(J)  discussed 
above,  we  find  its  value  to  be  N,  represented  by  b  bits 
(/Vi . *o)-' 

First  we  form  the  hypothesis  that  N  is  due  to  a  single- 
line  stuck-on-one  determinate  fault  on  line  J 
(Os/sh-l).  If  the  fault  is  m  line  y=0,  then  N(0)  N 
error  bits  £,n=l  in  line  0  will  produce  the  byte  check 
result  N.  We  determine  the  numbers  N(J)  of  error  bits 

£/=l  on  lines  j-  1 . b- 1  respectively  that  would 

be  needed  to  produce  the  byte  check  result  N  by  end- 
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•round  shifting  N  to  the  right  b- 1  times.  The  shifts 
will  produce  the  numbers  NO) . N(b- 1)  in  succes- 


Tbe  number  of  error  bits  E\=  I  (due  to  •  stuck -on-zero 
line)  that  would  be  needed  to  produce  N(j)  for  any 
Os)sb-l  is  given  by  (2*-l)-Af(/).  that  is,  the  “one’s 
complement"  of  N(J).  All  values  of  N(J)  and 
(2*-l)-/V(/)  that  are  greater  than  4+1  are  discarded 
as  impossible  solutions. 

To  test  the  hypothesis  that  a  given  byte  check  result  N  is 
due  to  a  single-line  determinate  fault,  we  use  the  line 
check  result 


=  (2*+,-oi  i(i*/2‘ 


This  result  will  contain  N(j)  digits  Rt  =  1  (Osfst  +  1)  if 
there  is  a  single-line  determinate  (stuck-on-one)  fault  in 
the  line  /.  The  presence  of  each  R,  =  1  indicates  that  the 
digits  X{  should  be  corrected  by  the  1-0  change. 

The  line  check  result  R  will  contain  (2*- 1  )~N(j)  digits 
K,=0  (Os/st+1)  if  there  is  a  single-line  determinate 
(stuck-on-zero)  fault  in  the  line  i.  The  presence  of  each 
R,  =  0  indicates  that  the  digit  xj  should  be  corrected  by 
the  0-1  change. 


Example  1:  Line  Correction 


Consider  an  operand  X  with  seven  bytes  (*  =7)  of  4  bits 
each  (6=4).  inverse-residue  coding  is  used  for  the  bytes 
(modulo  2*- 1  =  15)  and  for  the  lines  (modulo 
2**1-  1=255).  The  encoded  operand  (following  Figure 
1)  is  shown  below: 

check  line  line  line  line 

line  3  2  1  0 

1  0  10  0  byte  0 

0  0  0  1  1  byte  1 

0  1  0  0  1  byte  2 

0  0  0  0  0  byte  3 

0  1  0  1  1  byte  4 

1  0  0  1  0  byte  5 

0  10  10  byte  6 

0  0  110  check  byte  7 

The  byte  check  result  (modulo  15)  is  N-llll,  and  the 
line  check  result  (modulo  255)  is  ft  =  11111111.  No  er¬ 
rors  are  indicated. 

Now  assume  a  stuck-on-nne  line  2  and  set  all  digits  in 
line  2  to  one.  The  new  byte  check  result  is  W  =  1001. 
The  single-line  determinate  fault  possibilities  are: 


Stuck  -onOne 


Stuck -onZero 


N( 0)  -  1001  -  9  15-jV(0)  -  6 

N(l)  -  1100  -  12  15-N(1)  -  3 

N(2)  -  0110  -  6  15-N(2)  “  9 

N( 3)  -  0011  -  3  15-N(3)  -  12 

The  values  greater  than  4  +  1=8  ate  discarded,  and  the 
remaining  possibilities  are:  line  2  (6  errors)  or  line  3  (3 
errors)  stuck-on-one,  and  line  0  (6  errors)  or  line  1  (3  er¬ 
rors)  stuck-on-zero. 

The  new  line  check  result  la 

R  -  ( . R0)  -  01111110 

The  sis  ones  in  R  Indicate  that  the  "line  2  stuck-on-one” 
hypothesis  is  valid,  and  the  corresponding  sis  positions  in 
line  2  are  corrected  by  setting  them  to  zero. 


The  single-line,  unidirectional  error  correction  algorithm 
can  not  be  completed  only  in  the  cases  in  which  two 
conditions  occur  simultaneously: 

(a)  More  than  one  line  is  indicated  by  the  occurrence 
of  identical  values  of  N{J)  or  of  15 -N(J)  for  two 
or  more  lines  j  of  X. 

(b)  The  correction  pattern  indicated  by  the  line  check 
result  R  is  actually  applicable  to  more  than  one 
line  j  of  the  operand  X,  i.e.,  the  lines  have  all 
zeros  (or  all  ones)  in  the  positions  to  be  correct¬ 
ed. 

Example  2  below  illustrated  condition  (2);  the  subse¬ 
quent  discussion  deals  with  condition  (b). 

Example  3:  Correction  Ambiguity 


Now  assume  that  line  1  Is  stuck-on-zero.  The  byte  check 
result  is  N  =  0101,  and  the  possibilities  are: 


Siuckon-One 


Stuck -on  Zero 


N( 0)  -  0101  =  5  15-N(0)  -  10 

N(  1)  -  1010  =  10  15-N(1)  -  5 

N( 2)  =  0101  =  5  15-N(2)  -  10 

N(3)  -  1010  =  10  15-2V(3)  -  5 

The  remaining  possibilities  all  point  to  five  errors.  The 
modulo  255  line  check  result  is 

R  -  00001101 

The  five  zeros  In  R  (positions  7,6 .5 ,4,1)  Indicate  a 
stuck-on-zero  on  line  1  or  line  3.  To  resolve  the  ambi¬ 
guity,  we  find  that  line  3  already  has  “1"  digits  in  posi¬ 
tions  6  and  4,  and  ennnot  be  corrected  there;  therefore 
the  stuck  line  must  be  line  1. 


It  is  possible  that  both  potential  corrections  could  be 
carried  out  in  Example  2  above;  that  is,  both  line  1  and 


.I  **(  *>1  •«!. 


l.rwi  ».«  i.i  i.i  i.»  M  »ti  ».» 


line  3  could  have  zeros  in  positions  7,6, 5, 4,1.  In  such  a 
case,  the  error  has  been  detected,  but  a  correction  is  not 
possible,  since  both  conditions  (a)  and  (b)  occur  simul¬ 
taneously. 


I.  Two-Line  and  Three- Line  Errors 

A  more  critical  case  than  the  ambiguity  discussed  above 
would  be  that  of  a  mis-correction,  in  which  the  restored 
pattern  would  differ  from  the  original  one,  such  as  in 
the  case  of  triple  errors  encountered  by  the  Hamming 
SEC/DED  code. 

A  mis-correction  for  two-dimensional  inverse  residue 
codes  will  occur  if  the  bit  pattern  of  the  operand  X 
changes  in  more  than  one  line,  but  both  the  byte  check 
result  value  N  and  the  line  check  result  value  R  remain 
the  same  as  for  a  single-line  error.  This  will  happen 
when: 

(a)  the  byte  check  result  is  altered  by  ±c(2*-l) 

(b)  the  line  check  result  is  altered  by  ±c(2*  +  1  -1) 

(c)  both  (a)  and  (b)  occur  simultaneously. 

In  cases  (a)  and  (b)  the  other  check  result  remains  un¬ 
changed. 

It  is  readily  shown  that  a  mis-correction  cannot  occur  if 
only  two  adjacent  lines  ( or  bytes)  are  affected  b>  the 
fault;  the  detection  is  guaranteed  in  all  cases.  When 
three  adjacent  lines  (or  bytes)  are  affected,  a  mis- 
oorrectkm  can  occur.  The  byte  check  result  will  be  al¬ 
tered  by  ±(2*-l)  when  the  following  changes  are  im¬ 
posed  on  a  correctable  unidirectional  single-line  error 
pattern: 

(a)  two  error  bits  from  line  j  are  moved  one  line  to 
the  right,  causing  a  net  change  in  N  of 

2(26-i)-2  =  2h-2; 

(b)  one  error  bit  from  line  j  is  moved  one  line  to  the 
left,  causing  a  net  change  in  N  of  2-1  =  1. 

The  total  change  in  N  is  then  (2*-2)+ 1  =2*-l,  and  it  will 
lead  to  a  mis-correction  if  the  following  two  conditions 
are  satisfied: 

(1)  there  are  no  further  error  changes,  and 

(2)  the  positions  in  line  j  that  would  be  mis -corrected 
actually  do  contain  correctable  bit  values. 

An  example  of  the  conditions  under  which  a  mis- 
correction  will  occur  is  shown  in  Example  3  below. 


Example  3:  Conditions  Cor  Mla-Correctlon 


Consider  the  encoded  operand  below  (same  format  as  in 
Example  1).  Without  changes,  both  Af  — 1111  and 
*-11111111  are  obtained. 


line  line 

3  2 


line  tine 
1  0 


0  1 
1  0  1 
1  1 
0  1 
1  1  0 
0  1  0 
1  1  0 


1  1  0 
1  1  0 


byte  0 
byte  1 
byte  2 
byte  3 
byte  4 
byte  5 
byte  6 

check  byte  7 


The  unidirectional  (0-1)  errors  affect  three  adjacent 
lines  (3,2,1)  as  shown,  and  impose  exactly  rtx  changes. 
Now  we  get  N-1001  and  *  =  0111110.  This  is  exactly 
the  same  condition  as  in  Example  1,  and  'line  2  stuck  on 
one*  hypothesis  is  validated,  since  bytes  1  through  6  con¬ 
tain  ones  in  line  2.  Setting  those  six  bits  to  zero  will 
cause  a  mis-correction. 


9.  Conclusions 

It  is  concluded  that  the  two-dimensional  codes  are  very 
nearly  100%  (except  in  the  cases  of  ambiguity  as  illus¬ 
trated  in  Example  2)  single-line  correcting,  and  full 
100%  double-adjacent-line  detecting  codes  with  respect 
to  unidirectional  errors.  The  probability  of  mis- 
correction  in  the  case  of  three-adjacent-line  unidirection¬ 
al  e  it  ora  remains  very  low,  since  a  very  specific  error 
pattern  and  original  pattern  of  X  must  coincide  to  cause 
a  mis-correction. 

It  has  been  shown  that  t.  /te-serial  arithmetic  can  be  car¬ 
ried  out  with  operands  which  are  encoded  in  two- 
dimensional  residue  and  inverse-residue  codes.  Two- 
dimensional  encodings  provide  a  very  powerful  error- 
detecting  and  a  substantial  error-correcting  capability 
for  byte-serial  arithmetic.  Promising  application  areas 
are  systolic  arrays,  multiple-precision  arithmetic,  and 
high-speed  array  computing. 
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APPENDIX 

Example  4:  Line-Residue  Cheeking 

The  byte-serial  line-residue  checking  algorithm  of  Section 
3  is  illustrated  helow.  The  operand  AT  Is  from  Example  1, 
with  the  "line  2  stuck -on-one"  error.  Here  m  -  3  and 
k  =8. 

check 

line  001  0000  1 

line  3  0  10  10  10  0 

line  2  11111111 

line  1  11110  0  10 


line  0  0  0  0  1  0  1  I  0 


rilrio*?  c  j  — ►  0  110 


I(L)'  =  0  10  1  0  0  0  0  j:  ,,  ,Q 

2(^)’=  E(L)  +  25r  since  m  =  3 

Since  C3  =  0,  £(L)  is  selected  as  the  check  result,  with 
i34irio=  1  1  0 
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Layout  from  a  Topological  Description 
by 
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The  interconnection  topology  of  a  circuit  does  not,  in  general,  correspond  to 
a  planar  graph.  However  by  encompassing  the  routing  of  a  circuit  in  the 
specification,  it  is  possible  to  obtain  a  planar  characterization  of  the  topology  of  a 
circuit.  The  planar  topology  of  a  circuit  is  formally  defined  and  the  use  of 
specifications  with  planar  topology  for  the  layout  of  integrated  circuits  is  examined. 
An  applicative  language(FP)  is  used  to  obtain  circuit  specifications  with  planar  to¬ 
pology.  The  planar  topology  arises  naturally  out  of  the  constructs  used  to  specify  the 
behavior  of  the  circuit.  An  efficient  mapping  from  planar  topology  to  geometry  is 
implemented.  The  problem  of  transforming  the  planar  topology  to  minimize  the  in¬ 
terconnection  complexity  is  addressed  by  exploiting  the  structural  information  of  the 
specification  as  opposed  to  using  only  the  planar  topology. 
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Abstract 


We  propose  to  develop  a  general  and  systematic  methodology  for  the  design  of 
matrix  solvers,  based  on  the  dependence  graph  of  the  algorithms.  A  fully-parallel  graph 
is  transformed  to  incorporate  issues  such  as  data  broadcasting  and  synchronization, 
interconnection  structure,  I/O  bandwidth,  number  and  utilization  of  PEs,  throughput, 
delay,  and  the  capability  to  solve  problems  larger  than  the  size  of  the  array.  The 
objective  is  to  devise  a  methodology  which  handles  and  relates  features  of  the  algorithm 
and  the  implementation,  in  a  unified  manner.  This  methodology  assists  a  designer  in 
selecting  transformations  to  an  algorithm  from  a  set  of  feasible  ones,  and  in  evaluating 
the  resulting  implementations. 

This  research  is  motivated  by  the  lack  of  an  adequate  design  methodology  for  matrix 
computations.  Standard  structures  (systolic  arrays)  have  been  used  for  these  implemen¬ 
tations,  but  they  might  be  non-optimal  for  a  particular  algorithm.  Reported  systems 
have  used  ad-hoc  design  approaches.  Some  design  methodologies  have  been  proposed, 
but  they  do  not  address  many  important  issues. 

A  preliminary  version  of  the  proposed  methodology  has  been  applied  to  algorithms 
for  matrix  multiplication  and  LU-decomposition.  The  approach  produces  structures 
which  correspond  to  proposed  systolic  arrays  for  these  computations,  as  well  as  struc- 
tures  which  exhibit  better  efficiency  than  those  arrays.  The  results  show  that  different 
transformations  on  a  graph  may  lead  to  entirely  different  computing  structures.  The 
selection  of  an  adequate  transformation  is  thus  directed  by  the  specific  restrictions  and 
performance  objectives  imposed  on  the  implementation.  The  designer  can  identify  and 
manipulate  the  parameters  that  are  more  relevant  to  a  given  application. 


